Discussion:
Matching grub data in the MBR with the installed grub-pc package
(too old to reply)
Andy Smith
2024-09-09 20:10:01 UTC
Permalink
Hi,

I've come into possession of a machine running Debian 10 with two
drives in it; sda and sdb. These have been labelled with a DOS MBR
and partitioned. The first partition starts at sector 2048 of both
drives (512 byte sectors). It appears that GRUB has been installed
on both sda and sdb:

$ sudo dd if=/dev/sda bs=1 count=512 2>/dev/null | xxd
00000000: eb63 9010 8ed0 bc00 b0b8 0000 8ed8 8ec0 .c..............
00000010: fbbe 007c bf00 06b9 0002 f3a4 ea21 0600 ...|.........!..
00000020: 00be be07 3804 750b 83c6 1081 fefe 0775 ....8.u........u
00000030: f3eb 16b4 02b0 01bb 007c b280 8a74 018b .........|...t..
00000040: 4c02 cd13 ea00 7c00 00eb fe00 0000 0000 L.....|.........
00000050: 0000 0000 0000 0000 0000 0080 0100 0000 ................
00000060: 0000 0000 fffa 9090 f6c2 8074 05f6 c270 ...........t...p
00000070: 7402 b280 ea79 7c00 0031 c08e d88e d0bc t....y|..1......
00000080: 0020 fba0 647c 3cff 7402 88c2 52bb 1704 . ..d|<.t...R...
00000090: f607 0374 06be 887d e817 01be 057c b441 ...t...}.....|.A
000000a0: bbaa 55cd 135a 5272 3d81 fb55 aa75 3783 ..U..ZRr=..U.u7.
000000b0: e101 7432 31c0 8944 0440 8844 ff89 4402 ***@.D..D.
000000c0: c704 1000 668b 1e5c 7c66 895c 0866 8b1e ....f..\|f.\.f..
000000d0: 607c 6689 5c0c c744 0600 70b4 42cd 1372 `|f.\..D..p.B..r
000000e0: 05bb 0070 eb76 b408 cd13 730d 5a84 d20f ...p.v....s.Z...
000000f0: 83d0 00be 937d e982 0066 0fb6 c688 64ff .....}...f....d.
00000100: 4066 8944 040f b6d1 c1e2 0288 e888 f440 @f.D...........@
00000110: 8944 080f b6c2 c0e8 0266 8904 66a1 607c .D.......f..f.`|
00000120: 6609 c075 4e66 a15c 7c66 31d2 66f7 3488 f..uNf.\|f1.f.4.
00000130: d131 d266 f774 043b 4408 7d37 fec1 88c5 .1.f.t.;D.}7....
00000140: 30c0 c1e8 0208 c188 d05a 88c6 bb00 708e 0........Z....p.
00000150: c331 dbb8 0102 cd13 721e 8cc3 601e b900 .1......r...`...
00000160: 018e db31 f6bf 0080 8ec6 fcf3 a51f 61ff ...1..........a.
00000170: 265a 7cbe 8e7d eb03 be9d 7de8 3400 bea2 &Z|..}....}.4...
00000180: 7de8 2e00 cd18 ebfe 4752 5542 2000 4765 }.......GRUB .Ge
00000190: 6f6d 0048 6172 6420 4469 736b 0052 6561 om.Hard Disk.Rea
000001a0: 6400 2045 7272 6f72 0d0a 00bb 0100 b40e d. Error........
000001b0: cd10 ac3c 0075 f4c3 f2b8 530e 0000 8020 ...<.u....S....
000001c0: 2100 fd35 373e 0008 0000 0038 0f00 0035 !..57>.....8...5
000001d0: 383e fd51 6031 0040 0f00 0098 3b00 0051 8>.Q`***@....;..Q
000001e0: 6131 fdef 45aa 00d8 4a00 00d0 1d00 0010 a1..E...J.......
000001f0: 64ab 05fe ffff feaf 6800 0230 27df 55aa d.......h..0'.U.
$ sudo dd if=/dev/sdb bs=1 count=512 2>/dev/null | xxd
00000000: eb63 9010 8ed0 bc00 b0b8 0000 8ed8 8ec0 .c..............
00000010: fbbe 007c bf00 06b9 0002 f3a4 ea21 0600 ...|.........!..
00000020: 00be be07 3804 750b 83c6 1081 fefe 0775 ....8.u........u
00000030: f3eb 16b4 02b0 01bb 007c b280 8a74 018b .........|...t..
00000040: 4c02 cd13 ea00 7c00 00eb fe00 0000 0000 L.....|.........
00000050: 0000 0000 0000 0000 0000 0080 0100 0000 ................
00000060: 0000 0000 fffa 9090 f6c2 8074 05f6 c270 ...........t...p
00000070: 7402 b280 ea79 7c00 0031 c08e d88e d0bc t....y|..1......
00000080: 0020 fba0 647c 3cff 7402 88c2 52be 807d . ..d|<.t...R..}
00000090: e817 01be 057c b441 bbaa 55cd 135a 5272 .....|.A..U..ZRr
000000a0: 3d81 fb55 aa75 3783 e101 7432 31c0 8944 =..U.u7...t21..D
000000b0: 0440 8844 ff89 4402 c704 1000 668b 1e5c ***@.D..D.....f..\
000000c0: 7c66 895c 0866 8b1e 607c 6689 5c0c c744 |f.\.f..`|f.\..D
000000d0: 0600 70b4 42cd 1372 05bb 0070 eb76 b408 ..p.B..r...p.v..
000000e0: cd13 730d 5a84 d20f 83d8 00be 8b7d e982 ..s.Z........}..
000000f0: 0066 0fb6 c688 64ff 4066 8944 040f b6d1 ***@f.D....
00000100: c1e2 0288 e888 f440 8944 080f b6c2 c0e8 ***@.D......
00000110: 0266 8904 66a1 607c 6609 c075 4e66 a15c .f..f.`|f..uNf.\
00000120: 7c66 31d2 66f7 3488 d131 d266 f774 043b |f1.f.4..1.f.t.;
00000130: 4408 7d37 fec1 88c5 30c0 c1e8 0208 c188 D.}7....0.......
00000140: d05a 88c6 bb00 708e c331 dbb8 0102 cd13 .Z....p..1......
00000150: 721e 8cc3 601e b900 018e db31 f6bf 0080 r...`......1....
00000160: 8ec6 fcf3 a51f 61ff 265a 7cbe 867d eb03 ......a.&Z|..}..
00000170: be95 7de8 3400 be9a 7de8 2e00 cd18 ebfe ..}.4...}.......
00000180: 4752 5542 2000 4765 6f6d 0048 6172 6420 GRUB .Geom.Hard
00000190: 4469 736b 0052 6561 6400 2045 7272 6f72 Disk.Read. Error
000001a0: 0d0a 00bb 0100 b40e cd10 ac3c 0075 f4c3 ...........<.u..
000001b0: 0000 0000 0000 0000 481f 78c6 0000 8020 ........H.x....
000001c0: 2100 fd35 373e 0008 0000 0038 0f00 0035 !..57>.....8...5
000001d0: 383e fd51 6031 0040 0f00 0098 3b00 0051 8>.Q`***@....;..Q
000001e0: 6131 fdef 45aa 00d8 4a00 00d0 1d00 0010 a1..E...J.......
000001f0: 64ab 05fe ffff feaf 6800 0230 27df 55aa d.......h..0'.U.

This machine does not boot properly, going immediately to a grub>
prompt. If, during the boot process, I force the BIOS to boot from
sdb then it does boot properly.

This machine is doing something useful at the moment, so I am under
pressure to not have it out of service for extended periods of time
while I tinker with it.

The drives are partitioned and set up for MD RAID-1. The current
grub config loads the mdraid1x module and is set to consider its
root as array UUID aca790f8:3fcc9451:e65b1821:87ee8ab7:

$ grep root=\' /boot/grub/grub.cfg | sed 's/^\t*//' | uniq
set root='mduuid/aca790f83fcc9451e65b182187ee8ab7'
$ for dev in sda1 sdb1; do sudo mdadm -E "/dev/$dev" | grep 'Array UUID'; done
Array UUID : aca790f8:3fcc9451:e65b1821:87ee8ab7
Array UUID : aca790f8:3fcc9451:e65b1821:87ee8ab7

/proc/mdstat shows all the MD arrays are currently running fine with
paired partitions from sda and sdb.

So, it looks like this machine is redundant against drive failure
EXCEPT for during boot, and it's just something odd with the MBR of
sda.

What is the simplest way to make it work, and be redundant?

Normally for things of this era (i.e. not UEFI) I would be taking
care to grub-install on both drives after install. Clearly someone
did install grub to both, but sda's copy is no longer working.
Perhaps grub has been upgraded since then, causing another
grub-install against sda (only), and then the BIOS's idea of drive
order flipped around?

Can I simply copy the first 512 bytes of sdb to the start of sda?

I do not particularly want to run grub-install, as the MBR of sdb is
known good at the moment. Perhaps though I could run:

$ sudo grub-install /dev/sda

and then compare again the first 512 bytes of each drive?

Is there any way to sanity check what an MBR will do, grub-wise? My
searching found grub-emu but I couldn't find any useful
documentation and didn't want to just run it. Similarly grub-probe.

I was kind of hoping that there would be something I could run which
would say "yes, this MBR has grub v<whatever> and is set to find its
grub.cfg on (hdX)", then I might be able to see some difference in
what the MBR of sda wants to do. I'm particularly interested in
seeing if the binary grub data in the MBR actually comes from the
grub that is installed from the grub-pc package in the OS.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting
Andy Smith
2024-09-09 20:30:01 UTC
Permalink
Post by Andy Smith
I was kind of hoping that there would be something I could run which
would say "yes, this MBR has grub v<whatever> and is set to find its
grub.cfg on (hdX)", then I might be able to see some difference in
what the MBR of sda wants to do. I'm particularly interested in
seeing if the binary grub data in the MBR actually comes from the
grub that is installed from the grub-pc package in the OS.
$ xxd /usr/lib/grub/i386-pc/boot.img > /tmp/img.hex
$ sudo dd if=/dev/sda bs=1 count=512 2>/dev/null | xxd > /tmp/sda.hex
$ sudo dd if=/dev/sdb bs=1 count=512 2>/dev/null | xxd > /tmp/sdb.hex
$ diff /tmp/sda.hex /tmp/img.hex | wc -l
66
$ diff /tmp/sdb.hex /tmp/img.hex | wc -l
28

Interesting.

Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Florent Rougon
2024-09-10 14:00:01 UTC
Permalink
Hi,

Not an expert on this matter, so take this with a grain of salt.
Post by Andy Smith
Can I simply copy the first 512 bytes of sdb to the start of sda?
I would not do this, one of the reasons being that AFAICT, the start
offsets of the (up to 4) primary partitions of each drive are among
these bytes.
Post by Andy Smith
I do not particularly want to run grub-install, as the MBR of sdb is
$ sudo grub-install /dev/sda
I believe so, although by habit I'd rather 'dpkg-reconfigure grub-pc'
where you can IIRC select the drives to act on (it will remember your
selection, so in case you don't select sdb in the debconf dialog for
fear of breaking it, next time GRUB is updated on that system, your sdb
GRUB installation would become out-of-date). Of course, I am assuming
the computer boots with BIOS, not UEFI.

(Keep a rescue disk, installation medium or Debian Live around, in case
there is a problem booting afterwards.)
Post by Andy Smith
and then compare again the first 512 bytes of each drive?
Out of curiosity, I skimmed through [1] and computed the offsets of your
"GRUB " strings as they would be found in memory when the code is run at
boot (adding 7C00h, AFAIUI). I found 7D88h for your sda and 7D80h for
your sdb, none of which matches the values at [1] under heading
“Location of the GRUB ID String and Error Messages in Memory”. Thus, my
understanding is that both of your MBRs were probably written by GRUB 2
(I wanted to check if maybe one had been written by GRUB 1 and the other
by GRUB 2).

Regards

[1] https://thestarman.pcministry.com/asm/mbr/GRUB.htm
--
Florent
Andy Smith
2024-09-10 16:50:01 UTC
Permalink
Hi,
Post by Florent Rougon
Post by Andy Smith
Can I simply copy the first 512 bytes of sdb to the start of sda?
I would not do this, one of the reasons being that AFAICT, the start
offsets of the (up to 4) primary partitions of each drive are among
these bytes.
Good point. I understand the bootloader is actually the first 446
bytes so maybe I should only be looking at these.

https://unix.stackexchange.com/a/254668/36243
Post by Florent Rougon
I'd rather 'dpkg-reconfigure grub-pc' where you can IIRC select
the drives to act on (it will remember your selection, so in case
you don't select sdb in the debconf dialog for fear of breaking
it, next time GRUB is updated on that system, your sdb GRUB
installation would become out-of-date). Of course, I am assuming
the computer boots with BIOS, not UEFI.
Yes, this machine boots with BIOS and MBR.

To keep such machines (BIOS boot, multiple boot drives, MD RAID for
redundancy once booted) in good booting health are people doing
anything more sophisticated than remembering to run "dpkg-reconfigure
grub-pc" and install grub to all boot drives any time grub-pc is
updated?
Post by Florent Rougon
Out of curiosity, I skimmed through [1] and computed the offsets of your
"GRUB " strings as they would be found in memory when the code is run at
boot (adding 7C00h, AFAIUI). I found 7D88h for your sda and 7D80h for
your sdb, none of which matches the values at [1] under heading
“Location of the GRUB ID String and Error Messages in Memory”. Thus, my
understanding is that both of your MBRs were probably written by GRUB 2
(I wanted to check if maybe one had been written by GRUB 1 and the other
by GRUB 2).
THis machine dates from 2016 and whatever was Debian stable at that
time. It will have been dist-upgrade as far as 10 (buster) after
that. As far as I;m aware the drives are the same as it was first
installed with.

Thanks for the info!
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Florent Rougon
2024-09-10 22:50:02 UTC
Permalink
Post by Andy Smith
Good point. I understand the bootloader is actually the first 446
bytes so maybe I should only be looking at these.
https://unix.stackexchange.com/a/254668/36243
The partition table indeed starts at offset 446 (decimal), however I'd
still rather run grub-install or “dpkg-reconfigure grub-pc” than copy
the first 446 bytes from one drive to another drive. The reason is that,
AFAIUI, what GRUB writes in this area when installed is likely to
contain disc-specific info. More specifically, according to [1]:

There isn't room for much function in the 446 bytes available for
executable code in the boot sector. The sole function of this stage1
code is to load the much larger stage2 boot program. When stage1 is
installed in the MBR, it is configured with the BIOS drive number and
the absolute LBA of the first sector of the stage2 file in the boot
partition. It loads that one sector into a fixed location in memory
and transfers control to it. (...)
Post by Andy Smith
Yes, this machine boots with BIOS and MBR.
To keep such machines (BIOS boot, multiple boot drives, MD RAID for
redundancy once booted) in good booting health are people doing
anything more sophisticated than remembering to run "dpkg-reconfigure
grub-pc" and install grub to all boot drives any time grub-pc is
updated?
That's what I've been doing for a bit more than 20 years (before
switching to UEFI), but that was only my home machine.
Post by Andy Smith
THis machine dates from 2016 and whatever was Debian stable at that
time. It will have been dist-upgrade as far as 10 (buster) after
that. As far as I;m aware the drives are the same as it was first
installed with.
These dates seem consistent with my guess that this is probably GRUB 2
that was installed to your MBRs.
Post by Andy Smith
Thanks for the info!
You're welcome. Hope someone with more experience chimes in. Good luck
in any case. :)

Regards

[1] https://www.linuxquestions.org/questions/linux-general-1/help-understand-446-bytes-of-boot-code-in-mbr-4175500398/#post5146305
--
Florent
Andy Smith
2024-09-11 00:50:01 UTC
Permalink
Hi,
Post by Florent Rougon
The partition table indeed starts at offset 446 (decimal), however I'd
still rather run grub-install or “dpkg-reconfigure grub-pc” than copy
the first 446 bytes from one drive to another drive. The reason is that,
AFAIUI, what GRUB writes in this area when installed is likely to
There isn't room for much function in the 446 bytes available for
executable code in the boot sector. The sole function of this stage1
code is to load the much larger stage2 boot program. When stage1 is
installed in the MBR, it is configured with the BIOS drive number and
the absolute LBA of the first sector of the stage2 file in the boot
partition. It loads that one sector into a fixed location in memory
and transfers control to it. (...)
Since booting from sdb wasn't working in any case, I thought I'd
experiment a bit. I copied the first 446 bytes of sda to sdb. This
made matters worse! Instead of a "grub> " prompt, I just got a blank
screen.

I then rebooted from sda and did:

$ sudo dpkg-reconfigure grub-pc

selecting both sda and sdb.

I was then able to boot from sdb.

This does leave me wondering however, if the boot code in the mBR of
sdb is now set to believe that this is "the second drive", I suppose
(hd1) in grub terms? With the implication that should sda fail or be
removed, this machine may still not boot because its boot code looks
for something on a drive that no longer exists (sdb now being (hd0))?

The grub.cfg itself (and later, the fstab) finds its drives by UUID
so I'm not worried about that part.

I just have dim memories about having to do grub-install to sdb but
trick it somehow that this was (hd0)…

I do also wonder why my simple dd of the first 446 bytes did not
work, as the /boot partition is at the same position on both drives
and is an MDADM RAID1 so should have its stage2 at the same LBA.
After doing the "dpkg-reconfigure grub-pc" the first 446 bytes of
both sad and sdb are (still) identical so something else somewhere
else must have been changed.

$ sudo dd if=/dev/sda bs=446 count=1 2>/dev/null | sha256sum
b7ccacdeb89b1fd8c272549c69ff07570b033747c0a84a73febc7851c7cf4f2e -
$ sudo dd if=/dev/sdb bs=446 count=1 2>/dev/null | sha256sum
b7ccacdeb89b1fd8c272549c69ff07570b033747c0a84a73febc7851c7cf4f2e -

Not understanding quite what is going on is worrying to me, even if
things do now work. 🙁

Thanks,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Roy J. Tellason, Sr.
2024-09-11 23:30:01 UTC
Permalink
Post by Andy Smith
This does leave me wondering however, if the boot code in the mBR of
sdb is now set to believe that this is "the second drive", I suppose
(hd1) in grub terms? With the implication that should sda fail or be
removed, this machine may still not boot because its boot code looks
for something on a drive that no longer exists (sdb now being (hd0))?
Simple enough to test by unplugging a cable...
--
Member of the toughest, meanest, deadliest, most unrelenting -- and
ablest -- form of life in this section of space,  a critter that can
be killed but can't be tamed.  --Robert A. Heinlein, "The Puppet Masters"
-
Information is more dangerous than cannon to a society ruled by lies. --James
M Dakin
Florent Rougon
2024-09-11 23:30:01 UTC
Permalink
Hi,
Post by Andy Smith
Since booting from sdb wasn't working in any case, I thought I'd
experiment a bit. I copied the first 446 bytes of sda to sdb. This
made matters worse! Instead of a "grub> " prompt, I just got a blank
screen.
I believe “sda” and “sdb” are swapped with respect to your first
message. Of course, it's expected that these are not stable across
reboots, however it's a bit confusing for me here.

(...)
Post by Andy Smith
This does leave me wondering however, if the boot code in the mBR of
sdb is now set to believe that this is "the second drive", I suppose
(hd1) in grub terms? With the implication that should sda fail or be
removed, this machine may still not boot because its boot code looks
for something on a drive that no longer exists (sdb now being (hd0))?
I believe this is not necessary the case. I've tried to read some of the
GRUB 2 stage 1 code from the grub2 2.12-5 package. I'm far from being
able to claim I understand everything, but... let's see.

My impression is that the “drive number” that is written to the MBR can
be of two kinds:

(a) an actual number, typically 0x80, 0x81, etc. for hard disks
(it is the BIOS drive number for INT 13h, cf. [1]);

(b) or the special value 0xFF (thus, the 128th hard disk is not
available for case (a)—too bad if you have that many disks!).

The special value 0xFF is the one you had on both of your drives and
means “use the boot drive” (the one the BIOS booted from, whose number
is in register DL when the BIOS transfers control to the MBR code loaded
at physical address 0x7C00):

(From grub2-2.12/grub-core/boot/i386/pc/boot.S)

.org GRUB_BOOT_MACHINE_BOOT_DRIVE
boot_drive:
.byte 0xff /* the disk to load kernel from */
/* 0xff means use the boot drive */

(...)

.org GRUB_BOOT_MACHINE_DRIVE_CHECK
(...) ← fixup of DL in case in was incorrectly set by the BIOS

/*
* Check if we have a forced disk reference here
*/
movb boot_drive, %al
cmpb $0xff, %al
je 1f
movb %al, %dl
1:
/* save drive reference first thing! */
pushw %dx

[ One may find it “interesting” that the “jmp 3f” from line 216 of
boot.S may be overwritten by internal ”grub-setup” code (cf.
“grub-bios-setup” in grub-install.c) from grub2-2.12/util/setup.c:

boot_drive_check = (grub_uint8_t *) (boot_img
+ GRUB_BOOT_MACHINE_DRIVE_CHECK);

(...)

/* If DEST_DRIVE is a hard disk, enable the workaround, which is
for buggy BIOSes which don't pass boot drive correctly. Instead,
they pass 0x00 or 0x01 even when booted from 0x80. */
if (!allow_floppy && !grub_util_biosdisk_is_floppy (dest_dev->disk))
{
/* Replace the jmp (2 bytes) with double nop's. */
boot_drive_check[0] = 0x90;
boot_drive_check[1] = 0x90;
}
]

In your case, I pretend your MBR-stored drive config (from your first
message for both drives) was “use the boot drive” for both sda and sdb,
because from grub2-2.12/include/grub/i386/pc/boot.h:

/* The offset of BOOT_DRIVE. */
#define GRUB_BOOT_MACHINE_BOOT_DRIVE 0x64

and both of your MBRs had 0xff at this offset:

00000060: 0000 0000 fffa 9090 f6c2 8074 05f6 c270 ...........t...p

You can also see right here the two NOPs (9090) at offset
GRUB_BOOT_MACHINE_DRIVE_CHECK (i.e. 0x66) which override the
aforementioned “jmp 3f” from boot.S line 216, because this stage1 code
was written to hard disks.

Conclusion: in your case, the option was “load the next stage from the
drive the BIOS booted from” for both MBRs. Therefore, AFAIUI, assuming
everything else was good (incl. the offset for finding the next stage),
it should still have been able to boot with only one of the drives
present in the machine.
Post by Andy Smith
The grub.cfg itself (and later, the fstab) finds its drives by UUID
so I'm not worried about that part.
I just have dim memories about having to do grub-install to sdb but
trick it somehow that this was (hd0)…
Yep... AFAIUI, hd0 is for times when GRUB talks to the BIOS (at boot)
and corresponds to 0x80 (on x86 machines), but when running grub-install
or the internal grub-bios-setup, GRUB attempts to guess how the BIOS is
going to number the devices you gave it in Linux-speak (/dev/sda,
/dev/sdb, etc.), which may be unreliable. At least the GRUB 1.x
documentation clearly said so according to my recollection, and
therefore indicated (in the 2000s) as the bullet-proof recipe, to
perform GRUB installation to hard disk *from GRUB itself* using the
(hd0), (hd1), etc. notations, e.g. after booting from a GRUB floppy
disk.
Post by Andy Smith
I do also wonder why my simple dd of the first 446 bytes did not
work, as the /boot partition is at the same position on both drives
and is an MDADM RAID1 so should have its stage2 at the same LBA.
After doing the "dpkg-reconfigure grub-pc" the first 446 bytes of
both sad and sdb are (still) identical so something else somewhere
else must have been changed.
GRUB is a complex beast; available documentation may be a bit confusing
when it comes to stage 1.5 and stage 2, e.g.[3]:

Version 0 (GRUB Legacy)
~~~~~~~~~~~~~~~~~~~~~~~

Stage 1 can load stage 2 directly, but it is normally set up to load the
stage 1.5., located in the first 30 KiB of hard disk immediately
following the MBR and before the first partition. (...) The stage 1.5
image contains file system drivers, enabling it to directly load stage 2
from any known location in the filesystem, for example from /boot/grub.

Version 2 (GRUB 2)
~~~~~~~~~~~~~~~~~~

[Different description]

In any case, stage 1 can load some “stage 1.5” from “empty sectors (if
available) between the MBR and the first partition”. These sectors
wouldn't by synchronized by MD RAID, unless you're using it on the whole
drives—as opposed to partition by partition. I don't claim that “this is
it”, but this might explain some difference between your drives' booting
behavior, even with identical:
- stage1 code+data in the MBR;
- boot partitions' start offset and contents.
Post by Andy Smith
Not understanding quite what is going on is worrying to me, even if
things do now work. 🙁
I just hope I didn't confuse you more. :-)

Regards

[1] https://en.wikipedia.org/wiki/INT_13H#List_of_INT_13h_services
[2] https://wiki.osdev.org/MBR_(x86)#MBR_Bootstrap
[3] https://en.wikipedia.org/wiki/GNU_GRUB
--
Florent
Andy Smith
2024-09-11 23:50:01 UTC
Permalink
Post by Florent Rougon
Hi,
Post by Andy Smith
Since booting from sdb wasn't working in any case, I thought I'd
experiment a bit. I copied the first 446 bytes of sda to sdb. This
made matters worse! Instead of a "grub> " prompt, I just got a blank
screen.
I believe “sda” and “sdb” are swapped with respect to your first
message. Of course, it's expected that these are not stable across
reboots, however it's a bit confusing for me here.
Yes, sorry. I actually have two machines like this I am looking at,
where one of them seems to have an older (non-working) grub on sda
and current grub on sdb, while the other has current grub on sds and
no grub at all on sdb. I tried to simplify by only talking about one
of these here, but ended up confusing myself several times over
which one I was getting info from.

Anyway. Copying the 446 bytes so as to make sda and sdb identical
did not work, as described. Then doing "dpkg-reconfigure grub-pc"
did result in working boot from either drive.
Post by Florent Rougon
The special value 0xFF is the one you had on both of your drives and
means “use the boot drive”
Okay, that is good to know, thanks!
Post by Florent Rougon
In any case, stage 1 can load some “stage 1.5” from “empty sectors (if
available) between the MBR and the first partition”. These sectors
wouldn't by synchronized by MD RAID, unless you're using it on the whole
drives—as opposed to partition by partition. I don't claim that “this is
it”, but this might explain some difference between your drives' booting
- stage1 code+data in the MBR;
- boot partitions' start offset and contents.
Sounds very plausible. The MD arrays are just made of partitions,
and the first partitions for /boot start at 2048 (512 byte) sectors
in.

So, there's more grub data at different places in that first 1MiB of
each boot disk. As some of it could be copied and some not, it
sounds like I should not try to fix this again with dd and instead
stick to reconfiguring grub-pc.
Post by Florent Rougon
Post by Andy Smith
Not understanding quite what is going on is worrying to me, even if
things do now work. 🙁
I just hope I didn't confuse you more. :-)
It was very helpful, thanks again!

Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Loading...