Discussion:
Backup.
(too old to reply)
Eduardo M KALINOWSKI
2024-06-27 19:10:01 UTC
Permalink
Now I have a pair of 500 GB external USB drives. Large compared to my
working data of ~3 GB. Please suggest improvements to my backup
system by exploiting these drives. I can imagine a complete copy of A
onto an external drive for each backup; but with most files in A not
changing during the backup interval, that is inefficient.
rnapshot
--
Mike: "The Fourth Dimension is a shambles?"
Bernie: "Nobody ever empties the ashtrays. People are SO inconsiderate."
-- Gary Trudeau, "Doonesbury"

Eduardo M KALINOWSKI
***@kalinowski.com.br
Michael Kjörling
2024-06-27 20:00:01 UTC
Permalink
Post by Eduardo M KALINOWSKI
Now I have a pair of 500 GB external USB drives. Large compared to my
working data of ~3 GB. Please suggest improvements to my backup
system by exploiting these drives. I can imagine a complete copy of A
onto an external drive for each backup; but with most files in A not
changing during the backup interval, that is inefficient.
rnapshot
Yes, rsnapshot.

Which is essentially a front-end to rsync --link-dest; so, for
mostly-static data, very efficient.
--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”
e***@gmx.us
2024-06-27 20:00:01 UTC
Permalink
Finding a file as it existed months or years ago can be tedious. For
example, find A/MailMessages as it was at 2023.02.07. Otherwise the
backup system works well.
On one computer I use rsync to do what appear to be complete backups, only
files identical to the previous run are hard-linked. This appears to fail
sometimes (not sure under what conditions), so I wrote scripts to find
identical files and link them. I think the scripts are too complex for my
pea brain to debug them successfully, so there are errors. But they seem to
work.
I can imagine a complete copy of A onto an external drive for each
backup; but with most files in A not changing during the backup interval,
that is inefficient.
When rsync works, that is exactly what it fixes.

When I boot the file server (possibly today but definitely tomorrow) I'll
post my backup script.

--
Most people don't even know what sysadmins do, but trust me,
if they all took a lunch break at the same time they wouldn't
make it to the deli before you ran out of bullets protecting
your canned goods from roving bands of mutants. -- Peter Welch
e***@gmx.us
2024-06-28 03:20:01 UTC
Permalink
Post by e***@gmx.us
When I boot the file server (possibly today but definitely tomorrow) I'll
post my backup script.
OK, it's pretty long so I won't post the whole thing, but the important
lines are

rsyncoptions="--archive --progress --verbose --recursive"

"$rsync" $rsyncoptions --hard-links --link-dest "$previous" "$source" "$current"

I have it create destination dirs like

/backup/2024-01-04_22:26:35/
/backup/2024-01-12_20:21:02/
/backup/2024-01-20_23:19:11/
/backup/2024-01-26_23:04:31/
/backup/2024-02-01_22:13:38/
/backup/2024-02-14_00:12:37/
/backup/2024-03-25_23:52:52/
/backup/2024-05-06_22:47:31/
/backup/2024-05-12_22:08:31/
/backup/2024-06-12_23:39:44/

so $previous and $current are two of those, and $source is /files .

--
"Never go off on tangents, which are lines that intersect a [circle] at
only one point and were discovered by Euclid, who lived in the [4th C
BC], which was an era dominated by the Goths, who lived in what we now
know as Poland." - from the Nov. 1998 issue of _Infosystems Executive_.
p***@easthope.ca
2024-06-30 15:10:02 UTC
Permalink
From: ***@gmx.us
Date: Thu, 27 Jun 2024 15:52:44 -0400
On one computer I use rsync ...
See reply to Eduardo.

Thx, ... P.
--
VoIP: +1 604 670 0140
work: https://en.wikibooks.org/wiki/User:PeterEasthope
e***@gmx.us
2024-06-30 16:10:01 UTC
Permalink
Post by p***@easthope.ca
Date: Thu, 27 Jun 2024 15:52:44 -0400
On one computer I use rsync ...
See reply to Eduardo.
Date: Thu, 27 Jun 2024 16:06:18 -0300
rnapshot
From https://rsnapshot.org/
rsnapshot is a filesystem snapshot utility ...
Rather than a snapshot of the extant file system, I want to keep a
history of the files in the file system.
rsync does do individual files. Not sure what it's on about. I can do
cp /backup/2024-06-25_13:55:24/x/y/z .
(assuming I did a backup then).

--
Scientist A: A matterbaby is a very unstable particle.
Scientist B: What's a matterbaby?
Scientist A: I'm doing fine honey, how you doing? -- mrshowrules on Fark
Jerome BENOIT
2024-06-27 20:20:01 UTC
Permalink
Hello,
why did you not use something as backup2l ?
Best wishes,
Jerome
Hi,
My working data is in a directory we can refer to as A. A is on a
removable flash store. "du -hs /home/me/A" reports 3.0G. I want a
reliable backup of most files A/*.
I created a directory "Backup" on the HDD and apply this shell
function whenever motivated.
Backup() { \
if [ $# -gt 1 ]; then
echo "Too many arguments.";
else
echo "0 or 1 arguments are OK.";
source="/home/me/A/*";
echo "source is $source.";
if [ $# -eq 0 ]; then
echo "0 arguments is OK.";
destination=/home/me/Backup;
echo "destination is $destination.";
else
echo "1 argument is OK.";
destination=/home/me/$1;
echo "destination is $destination.";
fi;
echo "Executing sync and rsync.";
sync;
rsync \
--exclude='Trap*' \
--exclude='*.mp3' \
--exclude='*.mp4' \
-auv $source $destination ;
/bin/ls -ld $destination/MailMessages;
printf "du -hs $destination => ";
du -hs $destination;
fi;
}
When the flash store fails, work since the last execution of Backup
can be lost.
In case the Backup directory on the HDD is lost or I want to see an
old file not current in A, I want backups of Backup. This function is
applied every week or two to write to a DVD.
FilesToDVD () { \
printf "Insert open or new DVD-R.";
read t;
startPath=$PWD;
echo "startPath is $startPath";
source=/home/me/Backup;
echo "source is $source";
cd $source;
xorriso -for_backup -dev /dev/sr0 \
-update_r . / \
-commit \
-toc -check_md5 failure -- \
-eject all ;
cd $startPath ;
echo " xorriso -dev /dev/sr0 -toc ";
echo " mount -o sbsector=nnnnnn /dev/sr0 /mnt/iso "; }
Finding a file as it existed months or years ago can be tedious. For
example, find A/MailMessages as it was at 2023.02.07. Otherwise the
backup system works well.
Now I have a pair of 500 GB external USB drives. Large compared to my
working data of ~3 GB. Please suggest improvements to my backup
system by exploiting these drives. I can imagine a complete copy of A
onto an external drive for each backup; but with most files in A not
changing during the backup interval, that is inefficient.
Thanks, ... Peter E.
p***@easthope.ca
2024-06-30 14:50:02 UTC
Permalink
From: Jerome BENOIT <***@rezozer.net>
Date: Thu, 27 Jun 2024 21:53:44 +0200
Post by Jerome BENOIT
why did you not use something as backup2l ?
Unaware of it.

From: https://github.com/gkiefer/backup2l
Post by Jerome BENOIT
The restore function allows to easily restore the state of the file
system or arbitrary directories/files of previous points in time.
...
An integrated split-and-collect function allows to comfortably
transfer all or selected archives to a set of CDs or other removable
media.
Appears relevant. To catch subtleties, will need to work with it. Two
phases of workflow may be required.

(1) backup2l records filesystem to rewritable medium such as HDD.

(2) Copy to optical medium. Each copy may require a fresh optical
disk. Thomas Schmidt's xorriso appends data to a disk. Medium is used
more efficiently..

Thanks, ... P.
--
VoIP: +1 604 670 0140
work: https://en.wikibooks.org/wiki/User:PeterEasthope
Thomas Schmitt
2024-06-27 20:50:01 UTC
Permalink
Hi,
This function is applied every week or two to write to a DVD.
xorriso -for_backup -dev /dev/sr0 \
-update_r . / \
-commit \
-toc -check_md5 failure -- \
-eject all ;
Finding a file as it existed months or years ago can be tedious.
You could give the backups volume ids which tell the date.

-volid BOB_"$(date '+%Y_%m_%d_%H%M%S')"

(BOB = Backup Of Backup :))
This would also make it possible to verify that the medium is either an
appendable BOB or blank. Before -dev you would insert:

-assert_volid 'BOB_*' fatal

As a whole:

xorriso -for_backup \
-assert_volid 'BOB_*' fatal \
-dev /dev/sr0 \
-volid BOB_"$(date '+%Y_%m_%d_%H%M%S')" \
-update_r . / \
-commit \
-toc -check_md5 failure -- \
-eject all

Then -toc can later tell the backup date.
I do daily backups with "HOME_" instead of "BOB_":

Media current: BD-RE
...
ISO session : 1 , 32 , 2129457s , HOME_2024_02_10_152530
ISO session : 2 , 2129504 , 26203s , HOME_2024_02_11_151813
...
ISO session : 138 , 8090176 , 34330s , HOME_2024_06_26_152446
ISO session : 139 , 8124512 , 50534s , HOME_2024_06_27_152907
Media summary: 139 sessions, 8172847 data blocks, 15.6g data, 7643m free

which gives me the opportunity to guess the volume id by date:

sudo osirrox -mount /dev/sr4 volid '*_2024_04_30_*' /mnt/iso

With less regular dates it would be helpful if the user could wish for
"youngest_before_given_date". I will ponder ...

For now you would have to apply manual work or your own shell magic based
on the output of
xorriso -indev /dev/sr0 -toc
in order to find the appropriate sbsector number, unless you can describe
the date string by a shell parser expression.
Now I have a pair of 500 GB external USB drives. Large compared to my
working data of ~3 GB. Please suggest improvements to my backup
system by exploiting these drives. I can imagine a complete copy of A
onto an external drive for each backup; but with most files in A not
changing during the backup interval, that is inefficient.
You could let xorriso do a daily update.
E.g. on a file of the mounted read-write filesystem on the external disk

destination=/mnt/backup_disk/xorriso_daily.iso
...
xorriso ... \
-dev "$destination" \
...
-not_leaf 'Trap*' \
-not_leaf '*.mp3' \
-not_leaf '*.mp4' \
-update_r "$source" / \
...

(-eject all would be futile. :))

You could easily have one or more ISOs with history and one or more rsync
mirror trees in the same filesystem. It is always good to keep one valid
backup untouched when the other gets endangered by writing.

(If xorriso -update_r is too slow compared to rsync, consider xorriso
command -disk_dev_ino as in the man page example.)

You could also dedicate the whole plain disk device without any prepared
partitions or filesystems (i do this with USB sticks):
xorriso ... -dev stdio:/dev/sdd ...
or a partition without prepared filesystem:
xorriso ... -dev stdio:/dev/sdd1 ...
In the beginning you would have to pseudo-blank the device file, so that
xorriso does not find data in its first few kB which would let it declare
it as "closed":
xorriso -outdev stdio:/dev/sdd -blank fast
(dd-ing 64 KiB of /dev/zero would have the same effect.)


Have a nice day :)

Thomas
CHRIS M
2024-06-27 21:00:01 UTC
Permalink
Post by Thomas Schmitt
Hi,
This function is applied every week or two to write to a DVD.
xorriso -for_backup -dev /dev/sr0 \
-update_r . / \
-commit \
-toc -check_md5 failure -- \
-eject all ;
Finding a file as it existed months or years ago can be tedious.
You could give the backups volume ids which tell the date.
-volid BOB_"$(date '+%Y_%m_%d_%H%M%S')"
(BOB = Backup Of Backup :))
This would also make it possible to verify that the medium is either an
< SNIP >

Did anyone else get this reply twice?

Thanks,
Chris
--
Sent with Vivaldi Mail. Download Vivaldi for free at vivaldi.com
p***@easthope.ca
2024-06-28 18:00:02 UTC
Permalink
Hello Thomas & all,

From: "Thomas Schmitt" <***@gmx.net>
Date: Thu, 27 Jun 2024 22:49:10 +0200
Post by Thomas Schmitt
You could give the backups volume ids which tell the date.
Thanks. I should have added that when you mentioned a few
years ago.
Post by Thomas Schmitt
This would also make it possible to verify that the medium is either an
-assert_volid 'BOB_*' fatal
Prevents me from appending to a DVD from another backup system. But I
need to add it when beginning a blank DVD.

Thanks. Will be sure that works before unwrapping one of the 500 GB
drives.

... P.
--
VoIP: +1 604 670 0140
work: https://en.wikibooks.org/wiki/User:PeterEasthope
Thomas Schmitt
2024-06-28 21:40:02 UTC
Permalink
Hi,
Post by p***@easthope.ca
xorriso -for_backup -dev /dev/sr0 \
Finding a file as it existed months or years ago can be tedious
You could give the backups volume ids which tell the date.
Thanks. I should have added that when you mentioned a few
years ago.
I am working on a solution for your non-unique volume id situation
by optionally referring to modification timestamps.
A new command -toc_info_type can switch -toc away from showing volume ids:

$ xorriso -indev /dev/sr0 -toc_info_type mtime -toc
xorriso 1.5.7 : RockRidge filesystem manipulator, libburnia project.
...
Media current: DVD+RW
...
Volume id : 'HOME_Z_2024_06_27_225526'
...
TOC layout : Idx , sbsector , Size , Modification Time
ISO session : 1 , 32 , 1240808s , 2024.06.20.232334
ISO session : 2 , 1240864 , 29797s , 2024.06.21.220651
ISO session : 3 , 1270688 , 20484s , 2024.06.23.225019
ISO session : 4 , 1291200 , 28928s , 2024.06.24.224429
ISO session : 5 , 1320128 , 21352s , 2024.06.25.223943
ISO session : 6 , 1341504 , 30352s , 2024.06.26.223934
ISO session : 7 , 1371872 , 29023s , 2024.06.27.225617
Media summary: 7 sessions, 1400744 data blocks, 2736m data, 1746m free

This is a zisofs compressed backup which happens every evening except
saturdays.
Note the time difference between 2024_06_27_225526 and 2024.06.27.225617.
These 51 seconds where spent between program start and begin of writing.

This program enhancement is already committed to git.
In a few days there will be a new GNU xorriso 1.5.7 tarball, which is
easy to build and to test without any danger of frankendebianing.
I'll give you a note.

Currently i am working on a mount gesture which shall automate the task
of picking the youngest backup before a given time point:

-mount /dev/sr0 not_after 2024.06.22.120000 /mnt/iso

I think "not_after" is a concise way to say "younger_or_exactly_that_age".
Up to the next release i could take better proposals. :))

(I am having my fun with time zones. The ISO 9660 timestamps can have a
time zone from their maker and the computer running xorriso has a local
time zone. Argh ...)
Post by p***@easthope.ca
-assert_volid 'BOB_*' fatal
Prevents me from appending to a DVD from another backup system. But I
need to add it when beginning a blank DVD.
You may use it also on a DVD which already has sessions with non-unique
ids. This needs two editing operations in your script.

Begin by adding the volume id command to the backup-of-backup run:

-volid BOB_"$(date '+%Y_%m_%d_%H%M%S')"

Then do an add-on backup to the DVD with the boring ids.

After this session add the check command:

-assert_volid 'BOB_*' fatal

In the further backup runs, the DVD is supposed to pass this check.


Have a nice day :)

Thomas
p***@easthope.ca
2024-06-30 13:40:01 UTC
Permalink
From: "Thomas Schmitt" <***@gmx.net>
Date: Fri, 28 Jun 2024 23:35:31 +0200
Post by Thomas Schmitt
I am working on a solution for your non-unique volume id situation
by optionally referring to modification timestamps.
$ xorriso -indev /dev/sr0 -toc_info_type mtime -toc
xorriso 1.5.7 : RockRidge filesystem manipulator, libburnia project.
...
Media current: DVD+RW
...
Volume id : 'HOME_Z_2024_06_27_225526'
...
TOC layout : Idx , sbsector , Size , Modification Time
ISO session : 1 , 32 , 1240808s , 2024.06.20.232334
ISO session : 2 , 1240864 , 29797s , 2024.06.21.220651
ISO session : 3 , 1270688 , 20484s , 2024.06.23.225019
ISO session : 4 , 1291200 , 28928s , 2024.06.24.224429
ISO session : 5 , 1320128 , 21352s , 2024.06.25.223943
ISO session : 6 , 1341504 , 30352s , 2024.06.26.223934
ISO session : 7 , 1371872 , 29023s , 2024.06.27.225617
Media summary: 7 sessions, 1400744 data blocks, 2736m data, 1746m free
This is a zisofs compressed backup which happens every evening except
saturdays.
Note the time difference between 2024_06_27_225526 and 2024.06.27.225617.
These 51 seconds where spent between program start and begin of writing.
This program enhancement is already committed to git.
In a few days there will be a new GNU xorriso 1.5.7 tarball, which is
easy to build and to test without any danger of frankendebianing.
Thanks Thomas. Ideally I should find time to follow your suggestions
but already overcommitted to volunteer activities. I might have to wait
until -toc_info_type is in a Debian release.

Thx, ... P.
--
VoIP: +1 604 670 0140
work: https://en.wikibooks.org/wiki/User:PeterEasthope
Thomas Schmitt
2024-07-02 12:20:01 UTC
Permalink
Hi,
Post by p***@easthope.ca
Thanks Thomas. Ideally I should find time to follow your suggestions
but already overcommitted to volunteer activities. I might have to wait
until -toc_info_type is in a Debian release.
Yeah. Everybody has to follow the own timetable.


So without any implied expectation of a test report by you:

I uploaded the new development snapshot

https://www.gnu.org/software/xorriso/xorriso-1.5.7.tar.gz

with the new time oriented table-of-content features.

They are for situations where the usually shown volume ids are fewly
informative (like "ISOIMAGE" in each session) but also if it is difficult
to create a search expression for the desired volume id when mounting.

Time oriented table-of-content:

$ xorriso -indev /dev/sr0 -toc_info_type mtime -toc
...
TOC layout : Idx , sbsector , Size , Modification Time
ISO session : 1 , 32 , 1240808s , 2024.06.20.232334
ISO session : 2 , 1240864 , 29797s , 2024.06.21.220651
ISO session : 3 , 1270688 , 20484s , 2024.06.23.225019
ISO session : 4 , 1291200 , 28928s , 2024.06.24.224429
ISO session : 5 , 1320128 , 21352s , 2024.06.25.223943
ISO session : 6 , 1341504 , 30352s , 2024.06.26.223934
ISO session : 7 , 1371872 , 29023s , 2024.06.27.225617
ISO session : 8 , 1400896 , 31463s , 2024.06.28.232751
ISO session : 9 , 1432384 , 34008s , 2024.06.30.223451
ISO session : 10 , 1466400 , 20329s , 2024.07.01.220901
Media summary: 10 sessions, 1486544 data blocks, 2903m data, 1579m free

Creating mount command lines by time constraints:

$ xorriso -mount_cmd /dev/sr0 not_after 2024.06.26.060000 /mnt/iso
...
mount -t iso9660 -o nodev,noexec,nosuid,ro,sbsector=1320128 '/dev/sr0' '/mnt/iso'

The defined time constraints are:
"at_time", "before", "after", "not_before", "not_after"

Time can be given in various formats, among them 'June 26' and
'June 26 2024'. See man xorriso, command -alter_date and "Examples of
input timestrings". If you become too perky, expect rejections like:
xorriso : SORRY : -mount: Cannot decode timestring 'Rapunzel'

Complaints about unfulfillable time constraints will tell the internally
used seconds since 1970:
$ xorriso -mount_cmd /dev/sr0 before "June 01 2024" /mnt/iso
...
libisoburn: FAILURE : Failed to find "before" "=1717192800"

"=" is the time_t indicator for xorriso.
Program "date" digests the string if "=" gets replaced by "@":
$ date -d @1717192800
Sat Jun 1 00:00:00 CEST 2024


Have a nice day :)

Thomas

Paul M Foster
2024-06-27 20:50:02 UTC
Permalink
Post by Eduardo M KALINOWSKI
Now I have a pair of 500 GB external USB drives. Large compared to my
working data of ~3 GB. Please suggest improvements to my backup
system by exploiting these drives. I can imagine a complete copy of A
onto an external drive for each backup; but with most files in A not
changing during the backup interval, that is inefficient.
rnapshot
Rsnapshot is written in Perl and is based on an article:
http://www.mikerubel.org/computers/rsync_snapshots/

I read that article a while back, and not knowing of the existence of
rsnapshot, I modified my existing bash/rsync backup script to use this
guy's methods. In essence, rather than backing up the same file again with
rsync, you just make a hard link to it. All "copies" of that file on the
backup link to that one copy. Saves a tremendous amount of room. I keep 7
backup directories on my backup drives (one for each day), and a cron job
fires of the backup each day.

You can take a look at the script at:

https://gitlab.com/paulmfoster/bkp

Paul
--
Paul M. Foster
Personal Blog: http://noferblatz.com
Company Site: http://quillandmouse.com
Software Projects: https://gitlab.com/paulmfoster
p***@easthope.ca
2024-06-30 14:00:01 UTC
Permalink
From: "Thomas Schmitt" <***@gmx.net>
Date: Fri, 28 Jun 2024 23:35:31 +0200
Post by Thomas Schmitt
You could give the backups volume ids which tell the date.
-volid BOB_"$(date '+%Y_%m_%d_%H%M%S')"
(BOB = Backup Of Backup :))
I'm beginning to learn Git. So I wonder about another approach where
files are in a local Git repository. That would allow tracing the
history of any file. A backup of the extant repository would still be
necessary.

I don't know the software well enough to compare the two approaches.

Thx, ... P.
--
VoIP: +1 604 670 0140
work: https://en.wikibooks.org/wiki/User:PeterEasthope
Michael Kjörling
2024-06-30 14:10:01 UTC
Permalink
Post by p***@easthope.ca
Post by Thomas Schmitt
You could give the backups volume ids which tell the date.
-volid BOB_"$(date '+%Y_%m_%d_%H%M%S')"
(BOB = Backup Of Backup :))
I'm beginning to learn Git. So I wonder about another approach where
files are in a local Git repository. That would allow tracing the
history of any file. A backup of the extant repository would still be
necessary.
That sounds a lot like etckeeper, except on a larger scale.
--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”
Andy Smith
2024-06-30 14:30:01 UTC
Permalink
Hello,
Post by p***@easthope.ca
I'm beginning to learn Git. So I wonder about another approach where
files are in a local Git repository. That would allow tracing the
history of any file. A backup of the extant repository would still be
necessary.
I don't know the software well enough to compare the two approaches.
Git has some properties that are desirable for general backup
purposes, but also some fairly huge downsides. For example:

- It's not efficient or performant for storing large binary files.
As a result, several extensions and external programs around git
exist for getting large binary files into git. Trying to use git
for general purpose backups will run up against this unless you
never want to back up large binary files.

- Git stores full (compressed) copies of every version of every
file. Most backup solutions do better on space.

- Git has no built in way to purge old content. It keeps it all. A
typical requirement for backup software is to have a bounded limit
on the oldest versions that will be kept, and quite often there
are more complex requirements such as "keep daily copies for a
month, week;y copies for 6 months, monthly copies for 6 years"
etc. Very hard to do with git.

My first thought when I read the post that started this thread was,

"What is this person doing? If the goal is to have a real world
project to learn some programming techniques and have fun, fair
enough, but if the goal here is to have a decent backup scheme
why are they not using any of the existing excellent solutions
that have thought of and solved so many of the problems in this
space?"

That did not seem like it would be a welcome response at the time so
I held my tongue, but if you are now thinking of looking in to using
git for the purpose, I think it's a wrong step, and in saying so I
might as well say the other as well.

Just use borgbackup, restic, amanda or even rsnapshot (ancient but
still functional).

Unless you are wanting to have a first hand learning experience
about why you should just use borgbackup, restic, amanda or even
rsnapshot.

(Also I think learning about git is best done by using it for what
it was designed for: managing source code in git.)

Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
t***@tuxteam.de
2024-06-30 14:40:01 UTC
Permalink
Post by Jerome BENOIT
Hello,
[...]
Post by Jerome BENOIT
Git has some properties that are desirable for general backup
- It's not efficient or performant for storing large binary files.
[...]

Plus, it doesn't record file ownership (for general backup,
this *is* important).

I'm a fan of rsync. If you want to keep more than one generation,
--link-dest or --compare-dest, as has been stated elsewhere in
this thread, are your friends.

Cheers
--
t
p***@easthope.ca
2024-06-30 15:40:01 UTC
Permalink
From: Andy Smith <***@strugglers.net>
Date: Sun, 30 Jun 2024 14:21:45 +0000
Post by Andy Smith
What is this person doing?
Keeping a historical backup by an efficient method.
https://lists.debian.org/debian-user/2024/06/msg00780.html
Post by Andy Smith
Just use borgbackup, restic, amanda or even rsnapshot (ancient but
still functional).
Ref. URL above. I've used xorriso for several years. If you advise
against it, please explain why.

Thx, ... P.
--
VoIP: +1 604 670 0140
work: https://en.wikibooks.org/wiki/User:PeterEasthope
Andy Smith
2024-06-30 15:50:01 UTC
Permalink
Post by p***@easthope.ca
Date: Sun, 30 Jun 2024 14:21:45 +0000
Post by Andy Smith
What is this person doing?
Keeping a historical backup by an efficient method.
While refusing to look into any modern backup system designed by
experienced people for that express goal, and instead writing tomes
of navel gazing to debian-user. It's not my cup of tea, but have
fun; as this and other threads demonstrate you will still find a
large number of willing playmates.

Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Anssi Saari
2024-06-30 15:30:01 UTC
Permalink
Post by p***@easthope.ca
I'm beginning to learn Git. So I wonder about another approach where
files are in a local Git repository. That would allow tracing the
history of any file. A backup of the extant repository would still be
necessary.
bup is a backup application using git. I like it because it can add
error correction codes generated by par2 in the backup.
p***@easthope.ca
2024-06-30 15:00:01 UTC
Permalink
From: Eduardo M KALINOWSKI <***@kalinowski.com.br>
Date: Thu, 27 Jun 2024 16:06:18 -0300
Post by Eduardo M KALINOWSKI
rnapshot
From https://rsnapshot.org/
rsnapshot is a filesystem snapshot utility ...
Rather than a snapshot of the extant file system, I want to keep a
history of the files in the file system.

Thanks, ... P.
--
VoIP: +1 604 670 0140
work: https://en.wikibooks.org/wiki/User:PeterEasthope
Andy Smith
2024-06-30 15:40:02 UTC
Permalink
Hi,
Post by p***@easthope.ca
Post by Eduardo M KALINOWSKI
From https://rsnapshot.org/
rsnapshot is a filesystem snapshot utility ...
Rather than a snapshot of the extant file system, I want to keep a
history of the files in the file system.
You should read more than one line of a page. That is exactly what
it is intended for. Snapshots become history when you keep multiple
of them.

I have used rsnapshot a lot (decades worth of use) and it's good but
it is not perfect (nothing is). It is probably a much better backup
system than anything one can typically come up with by hand at short
order, but here are some of its downsides:

- No built in compression or encryption. You can implement these
yourself using filesystem features.

- Since it uses hardlinks for deduplication, this brings with it
some inherent limitations:

- The filesystem you use must support hardlinks

- All versions of a file will have the same metadata (mtime,
permissions, ownership, etc) because hardlinks must have the
same metadata. As a consequence, any change of metadata will
result in two separate files being stored (not hardlinked
together) in order to represent that change. Even if the files
have identical content.

- Changing one byte of a file results in the storage of two
separate full copies of the two versions of the file. With
hardlinks either the file is entirely the same or it needs to
not be a hardlink. This makes rsnapshot and things like it
particularly bad for backing up large append-only files like log
files.

- rsnapshot only compares versions of a file at the same path and
point in time. So for example /path/to/foo is only ever compared
against /path/to/foo *from the previous backup run*. Other copies
of foo anywhere else on the system being backed up, or from other
systems being backed up, or from a backup run previous to the most
recent, will not be considered so will not be hardlinked together.

A typical system has a lot of duplicate files and once you start
backing up multiple systems there tends to be an explosion of
duplicate data. rsnapshot will not handle any of this specially
and will just store it all.

It is possible to improve this by for example running an external
deduplication tool over the backups, or using deduplication
facilities of a filesystem like zfs¹. This must be done carefully
otherwise the workings of rsnapshot can be disrupted.

- rsnapshot must walk through the entire previous backup to compare
all the content of the files to the content of the new files. This
is quite expensive and will involve tons of random seeks which is
a killer for rotational storage media. Once you get to several
million inodes in a backup run, you may find a run of rsnapshot
taking several hours.

On the other hand, rsnapshot's huge plus point is that everything is
stored in a tree of files and hardlinks so it can just be explored
and restored with normal filesystem tools. You don't need any part
of rsnapshot to access and restore your content. That is such a good
feature that many people feel able to overlook the negatives.

More featureful backup systems chunk backup content up and store it
by a has of its content, which tends to bring advantages like:

- Never needing to store the same chunk twice no matter where (or
when) it came from

- Easy to compress and encrypt

- Locating which data is in which chunk gets done by a database,
not by random access to a filesystem, so it's much faster. When
you say "I want /path/to/foo from a week ago, but also show me
every copy you have going back 3 years", that is a database query,
not a walk of a filesystem with potentially several million inodes
in it.

But, by doing that you lose the ability to just cp a file from your
backups.

Thanks,
Andy

¹ Though someone heavily in to an advanced filesystem like zfs may
be more inclined to take advantage of zfs's proper snapshot
capabilities (and zfs-send to move them off-site) than use
rsnapshot on it.
--
https://bitfolk.com/ -- No-nonsense VPS hosting
David Christensen
2024-06-30 19:20:01 UTC
Permalink
On 6/30/24 08:37, Andy Smith wrote:
<snip>


Thank you for that informative discussion of rsnapshot(1) and related. :-)


My initial reaction to this thread was to recommend Preston [1]. I
still think that is decent advice; both for noobs and for experienced
people who missed it.


David


[1] Preston, W., 2007, "Backup & Recovery", O'Reilly Media, Inc.
ISBN: 9780596102463,
https://www.oreilly.com/library/view/backup-recovery/0596102461/
Loading...