Discussion:
HDD long-term data storage with ensured integrity
(too old to reply)
Jonathan Dowland
2024-04-06 07:50:28 UTC
Permalink
AIUI neither LVM nor ext4 have data and metadata checksum and correction
features. But, it should be possible to achieve such by including
dm-integrity (for checksumming) and some form of RAID (for correction)
in the storage stack. I need to explore that possibility further.
It would be nice to have checksumming and parity stuff in the filesystem
layer, as BTRFS and XFS offer, but failing that, you can do it above
that layer using tried-and-tested tools such as sha1sum, par2, etc.

I personally would not rely upon RAID for anything except availability.
My advice is once you've detected corruption, which is exceedingly rare,
restore from backup.
--
Please do not CC me for listmail.

👱🏻 Jonathan Dowland
✎ ***@debian.org
🔗 https://jmtd.net
David Christensen
2024-04-06 07:50:42 UTC
Permalink
AIUI neither LVM nor ext4 have data and metadata checksum and correction
features.  But, it should be possible to achieve such by including
dm-integrity (for checksumming) and some form of RAID (for correction)
in the storage stack.  I need to explore that possibility further.
I have RTFM dm-integrity before and it is still experimental. I need
something that is production ready:

https://manpages.debian.org/bookworm/cryptsetup-bin/cryptsetup.8.en.html

Authenticated disk encryption (EXPERIMENTAL)


David
David Christensen
2024-04-06 07:51:22 UTC
Permalink
The most obvious alternative to ZFS on Debian would be Btrfs. Does anyone
have any comments or suggestions regarding Btrfs and data corruption bugs,
concurrency, CMM level, PSP, etc.?
If you're worried about such things, I'd think "the most obvious
alternative" is LVM+ext4. Both Btrfs and ZFS share the same underlying
problem: more features => more code => more bugs.
Stefan
AIUI neither LVM nor ext4 have data and metadata checksum and correction
features. But, it should be possible to achieve such by including
dm-integrity (for checksumming) and some form of RAID (for correction)
in the storage stack. I need to explore that possibility further.


David
Stefan Monnier
2024-04-06 07:51:31 UTC
Permalink
The most obvious alternative to ZFS on Debian would be Btrfs. Does anyone
have any comments or suggestions regarding Btrfs and data corruption bugs,
concurrency, CMM level, PSP, etc.?
If you're worried about such things, I'd think "the most obvious
alternative" is LVM+ext4. Both Btrfs and ZFS share the same underlying
problem: more features => more code => more bugs.


Stefan
Marc SCHAEFER
2024-04-08 09:40:01 UTC
Permalink
Does anyone have any comments or suggestions regarding how to use magnetic
hard disk drives, commodity x86 computers, and Debian for long-term data
storage with ensured integrity?
I use LVM on ext4, and I add a MD5SUMS file at the root.

I then power up the drives at least once a year and check the MD5SUMS.

A simple CRC could also work, obviously.

So far, I have not detected MORE corruption with this method than the
drive ECC itself (current drives & buses are much better than they
used to be). When I have errors detected, I replace the file with
another copy (I usually have multiple off-site copies, and sometimes
even on-site online copies, but not always). When the errors add
up, it is time to buy another drive, usually after 5+ years or
even sometimes 10+ years.

So, just re-reading the content might be enough, once a year or so.

This is for HDD (for SDD I have no offline storage experience, it
could be shorter).
David Christensen
2024-04-08 18:30:01 UTC
Permalink
Post by Marc SCHAEFER
Does anyone have any comments or suggestions regarding how to use magnetic
hard disk drives, commodity x86 computers, and Debian for long-term data
storage with ensured integrity?
I use LVM on ext4, and I add a MD5SUMS file at the root.
I then power up the drives at least once a year and check the MD5SUMS.
A simple CRC could also work, obviously.
So far, I have not detected MORE corruption with this method than the
drive ECC itself (current drives & buses are much better than they
used to be). When I have errors detected, I replace the file with
another copy (I usually have multiple off-site copies, and sometimes
even on-site online copies, but not always). When the errors add
up, it is time to buy another drive, usually after 5+ years or
even sometimes 10+ years.
So, just re-reading the content might be enough, once a year or so.
This is for HDD (for SDD I have no offline storage experience, it
could be shorter).
Thank you for the reply.


So, an ext4 file system on an LVM logical volume?


Why LVM? Are you implementing redundancy (RAID)? Is your data larger
than a single disk (concatenation/ JBOD)? Something else?


David
Marc SCHAEFER
2024-04-08 20:10:01 UTC
Permalink
Hello,
Post by David Christensen
So, an ext4 file system on an LVM logical volume?
Why LVM? Are you implementing redundancy (RAID)? Is your data larger than
a single disk (concatenation/ JBOD)? Something else?
For off-site long-term offline archiving, no, I am not using RAID.

No, it's not LVM+md, just plain LVM for flexibility.

Typically I use 16 TB hard drives, and I tend to use one LV per data
source, the LV name being the data source and the date of the copy.
Or sometimes I just copy a raw volume (ext4 or something else)
to a LV.

With smaller drives (4 TB) I tend to not use LVM, just plain ext4 on the
raw disk.

I almost never use partitionning.

However, I tend to use luks encryption (per ext4 filesystem) when the
drives are stored off-site. So it's either LVM -> LV -> LUKS -> ext4
or raw disk -> LUKS -> ext4.

You can find some of the scripts I use to automate this off-site
long-term archiving here:

https://git.alphanet.ch/gitweb/?p=various;a=tree;f=offsite-archival/LVM-LUKS
David Christensen
2024-04-08 22:50:01 UTC
Permalink
Post by Marc SCHAEFER
Hello,
Post by David Christensen
So, an ext4 file system on an LVM logical volume?
Why LVM? Are you implementing redundancy (RAID)? Is your data larger than
a single disk (concatenation/ JBOD)? Something else?
For off-site long-term offline archiving, no, I am not using RAID.
No, it's not LVM+md, just plain LVM for flexibility.
Typically I use 16 TB hard drives, and I tend to use one LV per data
source, the LV name being the data source and the date of the copy.
Or sometimes I just copy a raw volume (ext4 or something else)
to a LV.
With smaller drives (4 TB) I tend to not use LVM, just plain ext4 on the
raw disk.
I almost never use partitionning.
However, I tend to use luks encryption (per ext4 filesystem) when the
drives are stored off-site. So it's either LVM -> LV -> LUKS -> ext4
or raw disk -> LUKS -> ext4.
You can find some of the scripts I use to automate this off-site
https://git.alphanet.ch/gitweb/?p=various;a=tree;f=offsite-archival/LVM-LUKS
Thank you for the clarification. :-)


David
Marc SCHAEFER
2024-05-03 11:30:01 UTC
Permalink
Post by Marc SCHAEFER
For off-site long-term offline archiving, no, I am not using RAID.
Now, as I had to think a bit about ONLINE integrity, I found this
comparison:

https://github.com/t13a/dm-integrity-benchmarks

Contenders are btrfs, zfs, and notably ext4+dm-integrity+dm-raid

I tend to have a biais favoring UNIX layered solutions against
"all-into-one" solutions, and it seems that performance-wise,
it's also quite good.

I wrote this script to convince myself of auto-correction
of the ext4+dm-integrity+dm-raid layered approach.

It gives:

[ ... ]
[ 390.249699] md/raid1:mdX: read error corrected (8 sectors at 21064 on dm-11)
[ 390.249701] md/raid1:mdX: redirecting sector 20488 to other mirror: dm-7
[ 390.293807] md/raid1:mdX: dm-11: rescheduling sector 262168
[ 390.293988] md/raid1:mdX: read error corrected (8 sectors at 262320 on dm-11)
[ 390.294040] md/raid1:mdX: read error corrected (8 sectors at 262368 on dm-11)
[ 390.294125] md/raid1:mdX: read error corrected (8 sectors at 262456 on dm-11)
[ 390.294209] md/raid1:mdX: read error corrected (8 sectors at 262544 on dm-11)
[ 390.294287] md/raid1:mdX: read error corrected (8 sectors at 262624 on dm-11)
[ 390.294586] md/raid1:mdX: read error corrected (8 sectors at 263000 on dm-11)
[ 390.294712] md/raid1:mdX: redirecting sector 262168 to other mirror: dm-7

pretty much convicing.

So after testing btrfs and being not convinced, after doing some test on
a production zfs -- not convinced either -- I am going to ry
ext4+dm-integrity+dm-raid.

#! /bin/bash

set -e

function create_lo {
local f

f=$(losetup -f)

losetup $f $1
echo $f
}

# beware of the rm -r below!
tmp_dir=/tmp/$(basename $0)
mnt=/mnt

mkdir $tmp_dir

declare -a pvs
for p in pv1 pv2
do
truncate -s 250M $tmp_dir/$p

l=$(create_lo $tmp_dir/$p)

pvcreate $l

pvs+=($l)
done

vg=$(basename $0)-test
lv=test

vgcreate $vg ${pvs[*]}

vgdisplay $vg

lvcreate --type raid1 --raidintegrity y -m 1 -L 200M -n $lv $vg

lvdisplay $vg

# sync/integrity complete?
sleep 10
cat /proc/mdstat
echo
lvs -a -o name,copy_percent,devices $vg
echo
echo -n Type ENTER
read ignore

mkfs.ext4 -I 256 /dev/$vg/$lv
mount /dev/$vg/$lv $mnt

for f in $(seq 1 10)
do
# ignore errors
head -c 20M < /dev/random > $mnt/f_$f || true
done

(cd $mnt && find . -type f -print0 | xargs -0 md5sum > $tmp_dir/MD5SUMS)

# corrupting some data in one PV
count=5000
blocks=$(blockdev --getsz ${pvs[1]})
if [ $blocks -lt 32767 ]; then
factor=1
else
factor=$(( ($blocks - 1) / 32767))
fi

p=1
for i in $(seq 1 $count)
do
offset=$(($RANDOM * $factor))
echo ${pvs[$p]} $offset
dd if=/dev/random of=${pvs[$p]} bs=$(blockdev --getpbsz ${pvs[$p]}) seek=$offset count=1
# only doing on 1, not 0, since we have no way to avoid destroying the same sector!
#p=$((1 - p))
done

dd if=/dev/$vg/$lv of=/dev/null bs=32M
dmesg | tail

umount $mnt

lvremove -y $vg/$lv

vgremove -y $vg

for p in ${pvs[*]}
do
pvremove $p
losetup -d $p
done

rm -r $tmp_dir
Michael Kjörling
2024-05-03 12:50:01 UTC
Permalink
Post by Marc SCHAEFER
https://github.com/t13a/dm-integrity-benchmarks
Contenders are btrfs, zfs, and notably ext4+dm-integrity+dm-raid
ZFS' selling point is not performance, _especially_ on rotational
drives. In fact, it's fairly widely accepted that ZFS is in fact
inferior in performance compared to pretty much everything else
modern, even at the best of times; and some of its features help
mitigate its lower against-disk performance.

ZFS' value proposition lies elsewhere.

Which is fine. It's the right choice for some people; for others,
other alternatives provide better trade-offs.
--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”
David Christensen
2024-05-03 21:00:01 UTC
Permalink
Post by Marc SCHAEFER
Post by Marc SCHAEFER
For off-site long-term offline archiving, no, I am not using RAID.
Now, as I had to think a bit about ONLINE integrity, I found this
https://github.com/t13a/dm-integrity-benchmarks
Contenders are btrfs, zfs, and notably ext4+dm-integrity+dm-raid
I tend to have a biais favoring UNIX layered solutions against
"all-into-one" solutions, and it seems that performance-wise,
it's also quite good.
I wrote this script to convince myself of auto-correction
of the ext4+dm-integrity+dm-raid layered approach.
Thank you for devising a benchmark and posting some data. :-)


FreeBSD also offers a layered solution. From the top down:

* UFS2 file system, which supports snapshots (requires partitions with
soft updates enabled).

* gpart(8) for partitions (volumes).

* graid(8) for redundancy and self-healing.

* geli(8) providers with continuous integrity checking.


AFAICT the FreeBSD stack is mature and production quality, which I find
very appealing. But the feature set is not as sophisticated as ZFS,
which leaves me wanting. Notably, I have not found a way to replicate
UFS snapshots directly -- the best I can dream up is synchronizing a
snapshot to a backup UFS2 filesystem and then taking a snapshot with the
same name.


I am coming to the conclusion that the long-term survivability of data
requires several components -- good live file system, good backups, good
archives, continuous internal integrity checking with self-healing,
periodic external integrity checking (e.g. mtree(1)) with some form of
recovery (e.g. manual), etc.. If I get the other pieces right, I could
go with OpenZFS for the live and backup systems, and worry less about
data corruption bugs.


David
Marc SCHAEFER
2024-05-04 07:50:01 UTC
Permalink
Post by David Christensen
Thank you for devising a benchmark and posting some data. :-)
I did not do the comparison hosted on github. I just wrote the
script which tests the dm-integrity on dm-raid error detection
and error correction.
I prefer this approach, indeed.
Marc SCHAEFER
2024-05-20 12:40:01 UTC
Permalink
Hello,

1. INITIAL SITUATION: WORKS (no dm-integrity at all)

I have a Debian bookwork uptodate system that boots correctly with
kernel 6.1.0-21-amd64.

It is setup like this:

- /dev/nvme1n1p1 is /boot/efi

- /dev/nvme0n1p2 and /dev/nvme1n1p2 are the two LVM physical volumes

- a volume group, vg1 is built with those PVs

vg1 has a few LVs that have been created in RAID1 LVM mode:

lvdisplay | egrep 'Path|Mirrored'

LV Path /dev/vg1/root <-- this is /
Mirrored volumes 2
LV Path /dev/vg1/swap
Mirrored volumes 2
LV Path /dev/vg1/scratch
Mirrored volumes 2
LV Path /dev/vg1/docker
Mirrored volumes 2

As said, this boots without any issue.

2. ADDING dm-integrity WHILE BOOTED: works!

Now, while booted, I can add dm-integrity to one of the volumes,
let's say /dev/vg1/docker (this LV has absolutely no link with the
boot process, except obviously it is listed in /etc/fstab -- it also
fails the same way if even the swap is dm-integrit enabled, or
/):

lvconvert --raidintegrity y --raidintegritymode bitmap vg1/docker

and wait a bit til the integrity is setup with lvs -a (100%)

Obviously, this creates and uses a few rimage/rmeta sub LVs.

Then I did this (after having boot issues):

echo dm_integrity >> /etc/initramfs-tools/modules
update-initramfs -u

This did not change the below issue:

3. grub BOOT FAILS IF ANY LV HAS dm-integrity, EVEN IF NOT LINKED TO /

if I reboot now, grub2 complains about rimage issues, clear the screen
and then I am at the grub2 prompt.

Booting is only possible with Debian rescue, disabling the dm-integrity
on the above volume and rebooting. Note that you still can see the
rimage/rmeta sub LVs (lvs -a), they are not deleted! (but no
dm-integrity is activated).

4. update-grub GIVES WARNINGS

Now, if I try to start update-grub while booted AND having enabled
dm-integrity on the vg1/docker volume, I get:

# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.1.0-21-amd64
Found initrd image: /boot/initrd.img-6.1.0-21-amd64
error: unknown node 'docker_rimage_0'.
[ ... many ... ]
/usr/sbin/grub-probe: error: disk `lvmid/xLE0OV-wQy7-88H9-yKCz-4DUQ-Toce-h9rQvk/FzCf1C-95eB-7B0f-DSrF-t1pg-66qp-hmP3nZ' not found.
error: unknown node 'docker_rimage_0'.
[ ... many ... ]

[ this repeats a few times ]

Found linux image: /boot/vmlinuz-6.1.0-10-amd64
Found initrd image: /boot/initrd.img-6.1.0-10-amd64
Found memtest86+ 64bit EFI image: /boot/memtest86+x64.efi
Warning: os-prober will not be executed to detect other bootable partitions.
[ there are none ]
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done

Any idea what could be the problem? Any way to just make grub2 ignore
the rimage (sub)volumes at setup and boot time? (I could live with / aka
vg1/root not using dm-integrity, as long as the data/docker/etc volumes
are integrity-protected) ? Or how to make grub 100% compatible with a
vg1/root using dm-integrity (that would be obviously the final goal!)

Thank you for any pointers!
Franco Martelli
2024-05-21 18:50:01 UTC
Permalink
Post by Marc SCHAEFER
Any idea what could be the problem? Any way to just make grub2 ignore
the rimage (sub)volumes at setup and boot time? (I could live with / aka
vg1/root not using dm-integrity, as long as the data/docker/etc volumes
are integrity-protected) ? Or how to make grub 100% compatible with a
vg1/root using dm-integrity (that would be obviously the final goal!)
Thank you for any pointers!
I can only recommend you to read carefully the Wiki:

https://raid.wiki.kernel.org/index.php/Dm-integrity

HTH

kind regards
--
Franco Martelli
Marc SCHAEFER
2024-05-22 07:00:01 UTC
Permalink
Hello,
Post by Franco Martelli
https://raid.wiki.kernel.org/index.php/Dm-integrity
I did, and it looks it does not seem to document anything pertaining
to my issue:

1) I don't use integritysetup (from LUKS), but LVM RAID PVs -- I don't use
LUKS encryption anyway on that system

2) the issue is not the kernel not supporting it, because when the
system is up, it works (I have done tests to destroy part of the
underlying devices, they get detected and fixed correctly)

3) the issue is not with the initrd -- I added the dm-integrity module
and rebuilt the initrd (and actually the bug happens before grub2 loads
the kernel & init) -- or at least "not yet"! maybe this will fail
later :)

4) actually the issue is just grub2, be it when the system is up
(it complains about the special subvolumes) or at boot time

Having /boot on a LVM non enabled dm-integrity logical volume does not
work either, as soon as there is ANY LVM dm-integrity enabled logical
volume anywhere (even not linked to booting), grub2 complains (at boot
time or at update-grub) about the rimage LV.
Marc SCHAEFER
2024-05-22 07:00:01 UTC
Permalink
Post by Marc SCHAEFER
Having /boot on a LVM non enabled dm-integrity logical volume does not
work either, as soon as there is ANY LVM dm-integrity enabled logical
volume anywhere (even not linked to booting), grub2 complains (at boot
time or at update-grub) about the rimage LV.
I found this [1], quoting: "I'd also like to share an issue I've
discovered: if /boot's partition is a LV, then there must not be a
raidintegrity LV anywhere before that LV inside the same VG. Otherwise,
update-grub will show an error (disk `lvmid/.../...' not found) and GRUB
cannot boot. So it's best if you put /boot into its own VG. (PS: Errors
like unknown node '..._rimage_0 can be ignored.)"

So, the work-around seems to be to simple have /boot not on a LVM VG where
any LV has dm-integrity enabled.

I will try this work-around and report back here. As I said, I can
live with /boot on RAID without dm-integrity, as long as the rest can be
dm-integrity+raid protected.

[1] https://unix.stackexchange.com/questions/717763/lvm2-integrity-feature-breaks-lv-activation
Marc SCHAEFER
2024-05-22 10:10:01 UTC
Permalink
Hello,
Post by Marc SCHAEFER
I will try this work-around and report back here. As I said, I can
live with /boot on RAID without dm-integrity, as long as the rest can be
dm-integrity+raid protected.
So, enable dm-integrity on all LVs, including /, /var/lib/lxc, /scratch
and swap, now boots without any issue with grub2 as long as /boot is NOT
on the same VG where the dm-integrity over LVM RAID is enabled.

This is OK for me, I don't need /boot on dm-integrity.

update-grub gives out warning for every of the rimage subvolumes, but
can still then reboot.

I would guess the bug is thus in grub2, not yet supporting boot on a
/boot not necessarily dm-integrityfied itself, but on a VG where any
of the LV is.

Are readers seconding conclusion? If yes, I could report a bug on grub2.

Have a nice day.

Details:
***@ds-03:~# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
docker vg1 rwi-aor--- 500.00g 100.00
[docker_rimage_0] vg1 gwi-aor--- 500.00g [docker_rimage_0_iorig] 100.00
[docker_rimage_0_imeta] vg1 ewi-ao---- <4.07g
[docker_rimage_0_iorig] vg1 -wi-ao---- 500.00g
[docker_rimage_1] vg1 gwi-aor--- 500.00g [docker_rimage_1_iorig] 100.00
[docker_rimage_1_imeta] vg1 ewi-ao---- <4.07g
[docker_rimage_1_iorig] vg1 -wi-ao---- 500.00g
[docker_rmeta_0] vg1 ewi-aor--- 4.00m
[docker_rmeta_1] vg1 ewi-aor--- 4.00m
root vg1 rwi-aor--- 10.00g 100.00
[root_rimage_0] vg1 gwi-aor--- 10.00g [root_rimage_0_iorig] 100.00
[root_rimage_0_imeta] vg1 ewi-ao---- 148.00m
[root_rimage_0_iorig] vg1 -wi-ao---- 10.00g
[root_rimage_1] vg1 gwi-aor--- 10.00g [root_rimage_1_iorig] 100.00
[root_rimage_1_imeta] vg1 ewi-ao---- 148.00m
[root_rimage_1_iorig] vg1 -wi-ao---- 10.00g
[root_rmeta_0] vg1 ewi-aor--- 4.00m
[root_rmeta_1] vg1 ewi-aor--- 4.00m
scratch vg1 rwi-aor--- 10.00g 100.00
[scratch_rimage_0] vg1 gwi-aor--- 10.00g [scratch_rimage_0_iorig] 100.00
[scratch_rimage_0_imeta] vg1 ewi-ao---- 148.00m
[scratch_rimage_0_iorig] vg1 -wi-ao---- 10.00g
[scratch_rimage_1] vg1 gwi-aor--- 10.00g [scratch_rimage_1_iorig] 100.00
[scratch_rimage_1_imeta] vg1 ewi-ao---- 148.00m
[scratch_rimage_1_iorig] vg1 -wi-ao---- 10.00g
[scratch_rmeta_0] vg1 ewi-aor--- 4.00m
[scratch_rmeta_1] vg1 ewi-aor--- 4.00m
swap vg1 rwi-aor--- 8.00g 100.00
[swap_rimage_0] vg1 gwi-aor--- 8.00g [swap_rimage_0_iorig] 100.00
[swap_rimage_0_imeta] vg1 ewi-ao---- 132.00m
[swap_rimage_0_iorig] vg1 -wi-ao---- 8.00g
[swap_rimage_1] vg1 gwi-aor--- 8.00g [swap_rimage_1_iorig] 100.00
[swap_rimage_1_imeta] vg1 ewi-ao---- 132.00m
[swap_rimage_1_iorig] vg1 -wi-ao---- 8.00g
[swap_rmeta_0] vg1 ewi-aor--- 4.00m
[swap_rmeta_1] vg1 ewi-aor--- 4.00m

***@ds-03:~# df # filtered
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg1-root 10218772 2863956 6814144 30% /
/dev/nvme0n1p1 446282 109365 308450 27% /boot
/dev/mapper/vg1-scratch 10218772 24 9678076 1% /scratch
/dev/mapper/vg1-docker 514937088 805892 487900412 1% /var/lib/docker
/dev/nvme1n1p1 486456 5972 480484 2% /boot/efi
Andy Smith
2024-05-22 10:20:01 UTC
Permalink
Hello,
Post by Marc SCHAEFER
I will try this work-around and report back here. As I said, I can
live with /boot on RAID without dm-integrity, as long as the rest can be
dm-integrity+raid protected.
I'm interested in how you get on.

I don't (yet) use dm-integrity, but I have seen extreme fragility in
grub with regard to LVM. For example, a colleague of mine recently
lost 5 hours of their life (and their SLA budget) when simply adding
metadata tags to some PVs prevented grub from assembling them,
resulting in a hard to debug failed boot at next boot.

Anything that involves grub having to interact with LVM just seems
really fragile.

Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Marc SCHAEFER
2024-05-22 10:30:01 UTC
Permalink
Hello,
Post by Andy Smith
metadata tags to some PVs prevented grub from assembling them,
grub is indeed very fragile if you use dm-integrity anywhere on any of
your LVs on the same VG where /boot is (or at least if in the list
of LVs, the dm-integrity protected ones come first).

I guess it's a general problem how grub2 parses LVM, yes,
as soon as their are special things going on, it somehow breaks.

However, if you don't have /boot on LVM, hand-fixing grub2 can be
trivial, e.g. here on another system with /boot/efi on 1st disk's first
partition and /boot on 2nd disk's first partition.

linux (hd1,1)vmlinuz-5.10.0-29-amd64 root=/dev/mapper/vg1-root ro quiet
initrd (hd1,1)initrd.img-5.10.0-29-amd64
boot

(you even have completions in grub's interactive boot system)

and it boots. Next step: I am going to make me a USB boot key for that
system, in case (first using a simple mount of two partitions of the
USB key on /boot, respectively /boot/efi (vfat), then update-grub,
or if it breaks, completely by hand like above -- I have been using
syslinux for the last 20 years or so for that purpose, but it gets
apparently too complicated with Secure Boot and stuff).

PS: I have from now on decided I will always use a /boot no longer
on LVM but on a separate partition, like the /boot/efi, it
seems, indeed, much less fragile. Aka, back to what I
was doing a few years ago before my confidence in grub2
got apparently too high :)
Stefan Monnier
2024-05-22 21:10:01 UTC
Permalink
Post by Marc SCHAEFER
I found this [1], quoting: "I'd also like to share an issue I've
discovered: if /boot's partition is a LV, then there must not be a
raidintegrity LV anywhere before that LV inside the same VG. Otherwise,
update-grub will show an error (disk `lvmid/.../...' not found) and GRUB
cannot boot. So it's best if you put /boot into its own VG. (PS: Errors
like unknown node '..._rimage_0 can be ignored.)"
Hmm... I've been using a "plain old partition" for /boot (with
everything else in LVM) for "ever", originally because the boot loader
was not able to read LVM, and later out of habit. I was thinking of
finally moving /boot into an LV to make things simpler, but I see that
it'd still be playing with fire (AFAICT booting off of LVM was still not
supported by U-Boot either last time I checked). 🙁


Stefan
Marc SCHAEFER
2024-05-23 07:00:01 UTC
Permalink
Hello,
Post by Stefan Monnier
Hmm... I've been using a "plain old partition" for /boot (with
everything else in LVM) for "ever", originally because the boot loader
was not able to read LVM, and later out of habit. I was thinking of
finally moving /boot into an LV to make things simpler, but I see that
it'd still be playing with fire
grub supports, for a long time:

- / on LVM, with /boot within that filesystem
- /boot on LVM, separately

(it also worked with LILO, because LILO would record the exact address
where the kernel & initrd was, regardless of abstractions layers :->)

Recently, I have been playing with RAID-on-LVM (I was mostly using LVM
on md before, which worked with grub), and it works too.

Where grub fails, is if you have /boot on the same LVM volume group
where any of the LVs "before him in order" have:

- dm-integrity
- specific metadata

So yes, any advanced setup might break grub, and so the easiest is to
have /boot on its separate partition again for the time being.

Which makes two partitions of you also have an UEFI.
Post by Stefan Monnier
(AFAICT booting off of LVM was still not
supported by U-Boot either last time I checked). ????
No idea about that one, sorry.
Franco Martelli
2024-05-28 13:30:01 UTC
Permalink
Hi Marc,
Post by Marc SCHAEFER
3. grub BOOT FAILS IF ANY LV HAS dm-integrity, EVEN IF NOT LINKED TO /
if I reboot now, grub2 complains about rimage issues, clear the screen
and then I am at the grub2 prompt.
Booting is only possible with Debian rescue, disabling the dm-integrity
on the above volume and rebooting. Note that you still can see the
rimage/rmeta sub LVs (lvs -a), they are not deleted! (but no
dm-integrity is activated).
4. update-grub GIVES WARNINGS
Now, if I try to start update-grub while booted AND having enabled
# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.1.0-21-amd64
Found initrd image: /boot/initrd.img-6.1.0-21-amd64
error: unknown node 'docker_rimage_0'.
[ ... many ... ]
/usr/sbin/grub-probe: error: disk
`lvmid/xLE0OV-wQy7-88H9-yKCz-4DUQ-Toce-h9rQvk/FzCf1C-95eB-7B0f-DSrF-t1pg-66qp-hmP3nZ'
not found.
error: unknown node 'docker_rimage_0'.
[ ... many ... ]
[ this repeats a few times ]
Sorry for the late in the answer, but I've just noticed that the Linux
kernel of Debian Bookworm ISO image (debian-12.0.0-amd64-DVD-1.iso)
comes *without* "dm-integrity.ko" module, making therefore not possible
to support volumes formatted with "--raidintegrity y" neither those
formatted with "integritysetup" command (I think that it's a bug and it
should be reported).

When you booted in rescue mode which ISO image have you used?

Thank for your patience, kind regards.
--
Franco Martelli
Stefan Monnier
2024-04-08 21:10:01 UTC
Permalink
Post by David Christensen
Why LVM?
Personally, I've been using LVM everywhere I can (i.e. everywhere
except on my OpenWRT router, tho I've also used LVM there back when my
router had an HDD. I also use LVM on my 2GB USB rescue image).

To me the question is rather the reverse: why not?
I basically see it as a more flexible form of partitioning.

Even in the worst cases where I have a single LV volume, I appreciate
the fact that it forces me to name things, isolating me from issue
linked to predicting the name of the device and the issues that plague
UUIDs (the fact they're hard to remember, and that they're a bit too
magical/hidden for my taste, so they sometimes change when I don't want
them to and vice versa).


Stefan
David Christensen
2024-04-08 23:10:01 UTC
Permalink
Post by Stefan Monnier
Post by David Christensen
Why LVM?
Personally, I've been using LVM everywhere I can (i.e. everywhere
except on my OpenWRT router, tho I've also used LVM there back when my
router had an HDD. I also use LVM on my 2GB USB rescue image).
To me the question is rather the reverse: why not?
I basically see it as a more flexible form of partitioning.
Even in the worst cases where I have a single LV volume, I appreciate
the fact that it forces me to name things, isolating me from issue
linked to predicting the name of the device and the issues that plague
UUIDs (the fact they're hard to remember, and that they're a bit too
magical/hidden for my taste, so they sometimes change when I don't want
them to and vice versa).
Stefan
If I have a hot-pluggable device (SD card, USB drive, hot-plug SATA/SAS
drive and rack, etc.), can I put LVM on it such that when the device is
connected to a Debian system with a graphical desktop (I use Xfce) an
icon is displayed on the desktop that I can interact with to display the
file systems in my file manager (Thunar)?


David
Stefan Monnier
2024-04-09 00:00:02 UTC
Permalink
Post by David Christensen
If I have a hot-pluggable device (SD card, USB drive, hot-plug SATA/SAS
drive and rack, etc.), can I put LVM on it such that when the device is
connected to a Debian system with a graphical desktop (I use Xfce) an icon
is displayed on the desktop that I can interact with to display the file
systems in my file manager (Thunar)?
In the past: definitely not. Currently: no idea.
I suspect not, because I think the behavior on disconnection is still
poor (you want to be extra careful to deactivate all the volumes on the
drive *before* removing it, otherwise they tend to linger "for ever").

I guess that's one area where partitions are still significantly better
than LVM.


Stefan "who doesn't use much hot-plugging of mass storage"
David Christensen
2024-04-09 10:20:02 UTC
Permalink
Post by Stefan Monnier
Post by David Christensen
If I have a hot-pluggable device (SD card, USB drive, hot-plug SATA/SAS
drive and rack, etc.), can I put LVM on it such that when the device is
connected to a Debian system with a graphical desktop (I use Xfce) an icon
is displayed on the desktop that I can interact with to display the file
systems in my file manager (Thunar)?
In the past: definitely not. Currently: no idea.
I suspect not, because I think the behavior on disconnection is still
poor (you want to be extra careful to deactivate all the volumes on the
drive *before* removing it, otherwise they tend to linger "for ever").
I guess that's one area where partitions are still significantly better
than LVM.
Stefan "who doesn't use much hot-plugging of mass storage"
Thank you for the clarification. :-)


David
piorunz
2024-04-10 00:10:01 UTC
Permalink
Does anyone have any comments or suggestions regarding how to use
magnetic hard disk drives, commodity x86 computers, and Debian for
long-term data storage with ensured integrity?
I use Btrfs, on all my systems, including some servers, with soft Raid1
and Raid10 modes (because these modes are considered stable and
production ready). I decided on Btrfs not ZFS, because Btrfs allows to
migrate drives on the fly while partition is live and heavily used,
replace them with different sizes and types, mixed capacities, change
Raid levels, change amount of drives too. I could go from single drive
to Raid10 on 4 drives and back while my data is 100% available at all times.
It saved my bacon many times, including hard checksum corruption on NVMe
drive which otherwise I would never know about. Thanks to Btrfs I
located the corrupted files, fixed them, got hardware replaced under
warranty.
Also helped with corrupted RAM: Btrfs just refused to save file because
saved copy couldn't match read checksum from the source due to RAM bit
flips. Diagnosed, then replaced memory, all good.
I like a lot when one of the drives get ATA reset for whatever reason,
and all other drives continue to read and write, I can continue using
the system for hours, if I even notice. Not possible in normal
circumstances without Raid. Once the problematic drive is back, or after
reboot if it's more serious, then I do "scrub" command and everything is
resynced again. If I don't do that, then Btrfs dynamically correct
checksum errors on the fly anyway.
And list goes on - I've been using Btrfs for last 5 years, not a single
problem to date, it survived hard resets, power losses, drive failures,
countless migrations.
[1] https://github.com/openzfs/zfs/issues/15526
[2] https://github.com/openzfs/zfs/issues/15933
Problems reported here are from Linux kernel 6.5 and 6.7 on Gentoo
system. Does this even affects Debian Stable with 6.1 LTS?

--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄⠀⠀⠀⠀
David Christensen
2024-04-10 11:20:01 UTC
Permalink
Post by piorunz
Does anyone have any comments or suggestions regarding how to use
magnetic hard disk drives, commodity x86 computers, and Debian for
long-term data storage with ensured integrity?
I use Btrfs, on all my systems, including some servers, with soft Raid1
and Raid10 modes (because these modes are considered stable and
production ready). I decided on Btrfs not ZFS, because Btrfs allows to
migrate drives on the fly while partition is live and heavily used,
replace them with different sizes and types, mixed capacities, change
Raid levels, change amount of drives too. I could go from single drive
to Raid10 on 4 drives and back while my data is 100% available at all times.
It saved my bacon many times, including hard checksum corruption on NVMe
drive which otherwise I would never know about. Thanks to Btrfs I
located the corrupted files, fixed them, got hardware replaced under
warranty.
Also helped with corrupted RAM: Btrfs just refused to save file because
saved copy couldn't match read checksum from the source due to RAM bit
flips. Diagnosed, then replaced memory, all good.
I like a lot when one of the drives get ATA reset for whatever reason,
and all other drives continue to read and write, I can continue using
the system for hours, if I even notice. Not possible in normal
circumstances without Raid. Once the problematic drive is back, or after
reboot if it's more serious, then I do "scrub" command and everything is
resynced again. If I don't do that, then Btrfs dynamically correct
checksum errors on the fly anyway.
And list goes on - I've been using Btrfs for last 5 years, not a single
problem to date, it survived hard resets, power losses, drive failures,
countless migrations.
Those sound like some compelling features.


I believe the last time I tried Btrfs was Debian 9 (?). I ran into
problems because I did not do the required manual maintenance
(rebalancing). Does the Btrfs in Debian 11 or Debian 12 still require
manual maintenance? If so, what and how often?
Post by piorunz
[1] https://github.com/openzfs/zfs/issues/15526
[2] https://github.com/openzfs/zfs/issues/15933
Problems reported here are from Linux kernel 6.5 and 6.7 on Gentoo
system. Does this even affects Debian Stable with 6.1 LTS?
I do not know.
Post by piorunz
--
With kindest regards, Piotr.
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄⠀⠀⠀⠀
David
Curt
2024-04-10 15:00:01 UTC
Permalink
Post by David Christensen
Post by piorunz
I use Btrfs, on all my systems, including some servers, with soft Raid1
and Raid10 modes (because these modes are considered stable and
production ready). I decided on Btrfs not ZFS, because Btrfs allows to
migrate drives on the fly while partition is live and heavily used,
replace them with different sizes and types, mixed capacities, change
Raid levels, change amount of drives too. I could go from single drive
to Raid10 on 4 drives and back while my data is 100% available at all times.
It saved my bacon many times, including hard checksum corruption on NVMe
drive which otherwise I would never know about. Thanks to Btrfs I
located the corrupted files, fixed them, got hardware replaced under
warranty.
Also helped with corrupted RAM: Btrfs just refused to save file because
saved copy couldn't match read checksum from the source due to RAM bit
flips. Diagnosed, then replaced memory, all good.
I like a lot when one of the drives get ATA reset for whatever reason,
and all other drives continue to read and write, I can continue using
the system for hours, if I even notice. Not possible in normal
circumstances without Raid. Once the problematic drive is back, or after
reboot if it's more serious, then I do "scrub" command and everything is
resynced again. If I don't do that, then Btrfs dynamically correct
checksum errors on the fly anyway.
And list goes on - I've been using Btrfs for last 5 years, not a single
problem to date, it survived hard resets, power losses, drive failures,
countless migrations.
Those sound like some compelling features.
I don't believe in immortality. After many a summer dies the swan.
Paul Leiber
2024-04-10 16:10:01 UTC
Permalink
Post by David Christensen
Post by piorunz
Does anyone have any comments or suggestions regarding how to use
magnetic hard disk drives, commodity x86 computers, and Debian for
long-term data storage with ensured integrity?
I use Btrfs, on all my systems, including some servers, with soft Raid1
and Raid10 modes (because these modes are considered stable and
production ready). I decided on Btrfs not ZFS, because Btrfs allows to
migrate drives on the fly while partition is live and heavily used,
replace them with different sizes and types, mixed capacities, change
Raid levels, change amount of drives too. I could go from single drive
to Raid10 on 4 drives and back while my data is 100% available at all times.
It saved my bacon many times, including hard checksum corruption on NVMe
drive which otherwise I would never know about. Thanks to Btrfs I
located the corrupted files, fixed them, got hardware replaced under
warranty.
Also helped with corrupted RAM: Btrfs just refused to save file because
saved copy couldn't match read checksum from the source due to RAM bit
flips. Diagnosed, then replaced memory, all good.
I like a lot when one of the drives get ATA reset for whatever reason,
and all other drives continue to read and write, I can continue using
the system for hours, if I even notice. Not possible in normal
circumstances without Raid. Once the problematic drive is back, or after
reboot if it's more serious, then I do "scrub" command and everything is
resynced again. If I don't do that, then Btrfs dynamically correct
checksum errors on the fly anyway.
And list goes on - I've been using Btrfs for last 5 years, not a single
problem to date, it survived hard resets, power losses, drive failures,
countless migrations.
Those sound like some compelling features.
I believe the last time I tried Btrfs was Debian 9 (?).  I ran into
problems because I did not do the required manual maintenance
(rebalancing).  Does the Btrfs in Debian 11 or Debian 12 still require
manual maintenance?  If so, what and how often?
Scrub and balance are actions which have been recommended. I am using
btrfsmaintenance scripts [1][2] to automate this. I am doing a weekly
balance and a monthly scrub. After some reading today, I am getting
unsure if this is approach is correct, especially if balance is
necessary anymore (it usually doesn't find anything to do anyway), so
please take these periods with caution. My main message is that such
operations can be automated using the linked scripts.

Best regards,

Paul

[1] https://packages.debian.org/bookworm/btrfsmaintenance
[2] https://github.com/kdave/btrfsmaintenance
David Christensen
2024-04-10 23:20:01 UTC
Permalink
Post by Paul Leiber
Post by David Christensen
Does the Btrfs in Debian 11 or Debian 12 still require
manual maintenance?  If so, what and how often?
Scrub and balance are actions which have been recommended. I am using
btrfsmaintenance scripts [1][2] to automate this. I am doing a weekly
balance and a monthly scrub. After some reading today, I am getting
unsure if this is approach is correct, especially if balance is
necessary anymore (it usually doesn't find anything to do anyway), so
please take these periods with caution. My main message is that such
operations can be automated using the linked scripts.
Best regards,
Paul
[1] https://packages.debian.org/bookworm/btrfsmaintenance
[2] https://github.com/kdave/btrfsmaintenance
Thank you. Those scripts should be useful.


David
piorunz
2024-04-12 15:20:01 UTC
Permalink
Post by David Christensen
Those sound like some compelling features.
I believe the last time I tried Btrfs was Debian 9 (?).  I ran into
problems because I did not do the required manual maintenance
(rebalancing).  Does the Btrfs in Debian 11 or Debian 12 still require
manual maintenance?  If so, what and how often?
I don't do balance at all, it's not required.

Scrub is recommended, because it will detect any bit-rot due to hardware
errors on HDD media. It scans the entire surface of allocated sectors on
all drives. I do scrub usually monthly.

--
With kindest regards, Piotr.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/
⠈⠳⣄⠀⠀⠀⠀
David Christensen
2024-04-13 03:10:01 UTC
Permalink
Post by piorunz
Post by David Christensen
Those sound like some compelling features.
I believe the last time I tried Btrfs was Debian 9 (?).  I ran into
problems because I did not do the required manual maintenance
(rebalancing).  Does the Btrfs in Debian 11 or Debian 12 still require
manual maintenance?  If so, what and how often?
I don't do balance at all, it's not required.
Scrub is recommended, because it will detect any bit-rot due to hardware
errors on HDD media. It scans the entire surface of allocated sectors on
all drives. I do scrub usually monthly.
Thank you for the information.


David
Loading...