Hi,
Post by p***@easthope.caPost by Eduardo M KALINOWSKIFrom https://rsnapshot.org/
rsnapshot is a filesystem snapshot utility ...
Rather than a snapshot of the extant file system, I want to keep a
history of the files in the file system.
You should read more than one line of a page. That is exactly what
it is intended for. Snapshots become history when you keep multiple
of them.
I have used rsnapshot a lot (decades worth of use) and it's good but
it is not perfect (nothing is). It is probably a much better backup
system than anything one can typically come up with by hand at short
order, but here are some of its downsides:
- No built in compression or encryption. You can implement these
yourself using filesystem features.
- Since it uses hardlinks for deduplication, this brings with it
some inherent limitations:
- The filesystem you use must support hardlinks
- All versions of a file will have the same metadata (mtime,
permissions, ownership, etc) because hardlinks must have the
same metadata. As a consequence, any change of metadata will
result in two separate files being stored (not hardlinked
together) in order to represent that change. Even if the files
have identical content.
- Changing one byte of a file results in the storage of two
separate full copies of the two versions of the file. With
hardlinks either the file is entirely the same or it needs to
not be a hardlink. This makes rsnapshot and things like it
particularly bad for backing up large append-only files like log
files.
- rsnapshot only compares versions of a file at the same path and
point in time. So for example /path/to/foo is only ever compared
against /path/to/foo *from the previous backup run*. Other copies
of foo anywhere else on the system being backed up, or from other
systems being backed up, or from a backup run previous to the most
recent, will not be considered so will not be hardlinked together.
A typical system has a lot of duplicate files and once you start
backing up multiple systems there tends to be an explosion of
duplicate data. rsnapshot will not handle any of this specially
and will just store it all.
It is possible to improve this by for example running an external
deduplication tool over the backups, or using deduplication
facilities of a filesystem like zfs¹. This must be done carefully
otherwise the workings of rsnapshot can be disrupted.
- rsnapshot must walk through the entire previous backup to compare
all the content of the files to the content of the new files. This
is quite expensive and will involve tons of random seeks which is
a killer for rotational storage media. Once you get to several
million inodes in a backup run, you may find a run of rsnapshot
taking several hours.
On the other hand, rsnapshot's huge plus point is that everything is
stored in a tree of files and hardlinks so it can just be explored
and restored with normal filesystem tools. You don't need any part
of rsnapshot to access and restore your content. That is such a good
feature that many people feel able to overlook the negatives.
More featureful backup systems chunk backup content up and store it
by a has of its content, which tends to bring advantages like:
- Never needing to store the same chunk twice no matter where (or
when) it came from
- Easy to compress and encrypt
- Locating which data is in which chunk gets done by a database,
not by random access to a filesystem, so it's much faster. When
you say "I want /path/to/foo from a week ago, but also show me
every copy you have going back 3 years", that is a database query,
not a walk of a filesystem with potentially several million inodes
in it.
But, by doing that you lose the ability to just cp a file from your
backups.
Thanks,
Andy
¹ Though someone heavily in to an advanced filesystem like zfs may
be more inclined to take advantage of zfs's proper snapshot
capabilities (and zfs-send to move them off-site) than use
rsnapshot on it.
--
https://bitfolk.com/ -- No-nonsense VPS hosting