Discussion:
ext4 file system corruption - again
(too old to reply)
Jesper Dybdal
2024-09-23 09:30:01 UTC
Permalink
Some time ago I had a problem with the i_blocks field of a few inodes
being corrupted (replaced by extremely large numbers).

It now happened again, and the strange thing is that one of the two
files affected was also affected the earlier time (or more precisely,
the file that now has the same name as the old one had). That file was
/etc/postfix/master.cf - a simple text configuration file that is
modified only manually using emacs.

I wrote about it here when it first happened - that thread began with:
   Message-ID: <bf46ee0b-8af1-43ce-a48f-***@dybdal.dk>
   Date: Tue, 19 Mar 2024 15:43:30 +0100
   Subject: Filsystemkorruption i ext4?

I don't believe that it is a disk error - the file system is on a RAID1
partition and the RAID consistency is checked regularly.
I also find it hard to believe that it is a RAM error - the mashine has
run memtest86+ overnight without finding anything.
There was a power outage some time ago, but surely ext4 should be able
to handle that without introducing errors.

Fsck fixes it.
The system is an up-to-date Bookworm.

Any ideas as to how this can happen - twice, and effecting (among
others) the same file?

---------------- fsck log:
Log of fsck -C -a -T -t ext4 /dev/md0
Sun Sep 22 20:20:13 2024

root has been mounted 13 times without being checked, check forced.
root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
root: 223986/32759808 files (6.5% non-contiguous), 6827061/131038976 blocks
fsck exited with status code 1

---------------- stat(1) for the two files before fsck:
  File: main.cf
  Size: 16959         Blocks: 2251799813685048 IO Block: 4096 regular file
Device: 9,0    Inode: 10748715    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/ root)
Access: 2024-09-22 20:31:58.081768853 +0200
Modify: 2024-08-03 19:33:20.665350446 +0200
Change: 2024-09-22 22:09:54.053359071 +0200
 Birth: 2024-08-03 11:20:14.671832520 +0200

  File: master.cf
  Size: 10782         Blocks: 2251799813685176 IO Block: 4096 regular file
Device: 9,0    Inode: 10751288    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/ root)
Access: 2024-09-22 19:20:07.216191493 +0200
Modify: 2024-06-05 14:37:24.205001097 +0200
Change: 2024-09-22 22:09:54.053359071 +0200
 Birth: 2024-03-19 13:36:38.971618859 +0100
----------------

Thanks,
Jesper
--
Jesper Dybdal
https://www.dybdal.dk
Arno Lehmann
2024-09-23 10:00:01 UTC
Permalink
Hi Jesper,

Am 23.09.2024 um 11:20 schrieb Jesper Dybdal:
...
Post by Jesper Dybdal
Log of fsck -C -a -T -t ext4 /dev/md0
Sun Sep 22 20:20:13 2024
root has been mounted 13 times without being checked, check forced.
root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
root: 223986/32759808 files (6.5% non-contiguous), 6827061/131038976 blocks
I do not think it is likely to be a hardware problem. The bit patterns
for the number of blocks you quote actually have all the high bits of
the potentially used 48 bits of the block count set, for example
111111111111111111111111111111111111111111100111 and given that the 16
higher bits are not stored next to the lower 32 bits, there should be
many other values and flags set to all ones, which should result in
other things to notice -- for example, the file should be considered to
be part of the file system structure and compressed, which fsck should
loudly complain about (not verified).

The large values with peculiar bit patterns do look like some flag
values to me. It might be worth asking among the ext4 developers if
those values could be introduced by some particular condition, I think.

Cheers,

Arno
--
Arno Lehmann

IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
Jesper Dybdal
2024-09-24 09:10:01 UTC
Permalink
Post by Jesper Dybdal
...
Post by Jesper Dybdal
root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
root: 223986/32759808 files (6.5% non-contiguous), 6827061/131038976
blocks
I do not think it is likely to be a hardware problem. The bit patterns
for the number of blocks you quote actually have all the high bits of
the potentially used 48 bits of the block count set
...
Post by Jesper Dybdal
The large values with peculiar bit patterns do look like some flag
values to me. It might be worth asking among the ext4 developers if
those values could be introduced by some particular condition, I think.
I think I'll do that.
Post by Jesper Dybdal
root has been mounted 13 times without being checked, check forced.
Post by Jesper Dybdal
root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
^^^^^^^^^^^^^^^
AKA -25
Post by Jesper Dybdal
root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
^^^^^^^^^^^^^^^
AKA -9
It's odd that it looks like -N². Maybe it's just happenstance, but I'd
be curious to see if you have other "data points".
Unfortunately, I have no other "data points".  The data from the first
occurrence of this are lost.

Many thanks to Arno and Stefan - I probably had not noticed the
interesting values without their help.

Jesper
--
Jesper Dybdal
https://www.dybdal.dk
Stefan Monnier
2024-09-23 14:20:01 UTC
Permalink
Post by Jesper Dybdal
root has been mounted 13 times without being checked, check forced.
root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
^^^^^^^^^^^^^^^
AKA -25
Post by Jesper Dybdal
root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
^^^^^^^^^^^^^^^
AKA -9

It's odd that it looks like -N². Maybe it's just happenstance, but I'd
be curious to see if you have other "data points".


Stefan
Jesper Dybdal
2024-09-26 13:30:01 UTC
Permalink
Post by Stefan Monnier
Post by Jesper Dybdal
root has been mounted 13 times without being checked, check forced.
root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
^^^^^^^^^^^^^^^
AKA -25
Post by Jesper Dybdal
root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
^^^^^^^^^^^^^^^
AKA -9
It's odd that it looks like -N². Maybe it's just happenstance, but I'd
be curious to see if you have other "data points".
One more "data point" just appeared today.  I had edited /etc/fstab and
rebooted, and then fstab suddenly had the problem.:

root: Inode 10748542, i_blocks is 281474976710653, should be 1. FIXED.
AKA -3, so not of the -N² form.

From a stat() (after the fsck that repaired the file):
  File: fstab
  Size: 1855          Blocks: 8          IO Block: 4096   regular file

I haven't yet had time to describe the problem on the ext4 mailing
list.  I have remembered that there is one thing I do differently from
the default: I use the ext4 option "nodelalloc" (because several years
ago, there was a discussion about "delalloc or not" from which It seemed
that nodelalloc was probably slightly safer - if the associated
performance reduction is not a problem, which it is not for me).

I've now turned nodelalloc off, just in case that changes something.
--
Jesper Dybdal
https://www.dybdal.dk
Loading...