RAID6 mdraid -> LVM -> EXT4 root with GRUB2?

Question

2012-03-31 Debian Wheezy daily build in VirtualBox 4.1.2, 6 disk devices.

My steps to reproduce so far:

Setup one partition, using the entire disk, as a physical volume for RAID, per disk
Setup a single RAID6 mdraid array out of all of those
Use the resulting md0 as the only physical volume for the volume group
Setup your logical volumes, filesystems and mount points as you wish
Install your system

Both / and /boot will be in this stack. I've chosen EXT4 as my filesystem for this setup.

I can get as far as GRUB2 rescue console, which can see the mdraid, the volume group and the LVM logical volumes (all named appropriately on all levels) on it, but I cannot ls the filesystem contents of any of those and I cannot boot from them.

As far as I can see from the documentation the version of GRUB2 shipped there should handle all of this gracefully.

http://packages.debian.org/wheezy/grub-pc (1.99-17 at the time of writing.)

It is loading the ext2, raid, raid6rec, dosmbr (this one is in the list of modules once per disk) and lvm modules according to the generated grub.cfg file. Also it is defining the list of modules to be loaded twice in the generated grub.cfg file and according to quick Googling around this seems to be the norm and OK for GRUB2.

How to get further by getting GRUB2 to actually be able to read the content of the filesystems and boot the system?

What am I wrong about in my assumptions of functionality here?

EDIT (2012-04-01) My generated grub.cfg:

http://pastie.org/3708436

It seems it first makes my /usr logical volume the root and that might be source of the failure? A grub-mkconfig bug? Or is it supposed to get access to stuff from /usr before / and /boot? /boot is on / for me - no separate boot logical volume.

score 4 · Accepted Answer · edited Jun 29 '12 at 23:58

After all, it was a Grub2 bug/issue with a degraded software raid array.

Grub2 1.9x has issues with booting from a degraded array. Booting in rescue mode onto the system and letting the raid recover itself has fixed the issue for the original setup in question.

Incidentally the setup works (at the moment: 2012-06-26) straight out of the box on Fedora 17, Arch (stable) and Gentoo (stable + latest grub2 bzr via Portage): Grub2 2.0+ has fixed the issue. With the Wheezy freeze hitting soon, I'm thoroughly hoping for the issue to be resolved via either jumping to 2.0 or backporting the fix.

For me this still affects Debian 6, 7; Ubuntu 8.04, 10.04, 12.04.

Letting the raid sync in a single user mode recovery setup is an acceptable workaround for a home system, but having a potential extra hitch for rebooting a production server (even a small office file server) makes one think twice.

score 1 · Answer 2 · answered Jun 14 '13 at 13:27

Very good post, thanks a lot this helped me out quite a bit for installing an LVM - over - RAID on Debian Wheezy. Here are the steps I took to overcome the problem.

Update Grub2 to V2+

Add these lines to /etc/apt/sources.list

deb http://http.debian.net/debian unstable main
deb-src http://http.debian.net/debian unstable main

apt-get update

apt-get install grub2

Allen · Answer 3 · 2012-04-01T15:44:07.097

Perhaps you have made the single partition too large and did not leave space enough for GRUB2 installation and it has overwritten parts of the LVM space. Something of a longshot. Try your steps to recreate your problem except this time use a single disk (skip the RAID), create the single partition exactly as you did before and then the rest of it. If I am right, then you should have the same behavior.

UPDATE: So, this answer is wrong. I was looking through the GRUB2 manual and found this section which states:

If, instead, you only get a rescue shell, this usually means that GRUB failed to load the ‘normal’ module for some reason. It may be possible to work around this temporarily: for instance, if the reason for the failure is that ‘prefix’ is wrong (perhaps it refers to the wrong device, or perhaps the path to /boot/grub was not correctly made relative to the device), then you can correct this and enter normal mode manually:
 # Inspect the current prefix (and other preset variables):
 set
 # Find out which devices are available:
 ls
 # Set to the correct value, which might be something like this:
 set prefix=(hd0,1)/grub
 set root=(hd0,1)
 insmod normal
 normal
However, any problem that leaves you in the rescue shell probably means that GRUB was not correctly installed. It may be more useful to try to reinstall it properly using grub-install device (see Invoking grub-install). When doing this, there are a few things to remember:

Drive ordering in your operating system may not be the same as the boot drive ordering used by your firmware. Do not assume that your first hard drive (e.g. ‘/dev/sda’) is the one that your firmware will boot from. device.map (see Device map) can be used to override this, but it is usually better to use UUIDs or file system labels and avoid depending on drive ordering entirely.

At least on BIOS systems, if you tell grub-install to install GRUB to a partition but GRUB has already been installed in the master boot record, then the GRUB installation in the partition will be ignored.

If possible, it is generally best to avoid installing GRUB to a partition (unless it is a special partition for the use of GRUB alone, such as the BIOS Boot Partition used on GPT). Doing this means that GRUB may stop being able to read its core image due to a file system moving blocks around, such as while defragmenting, running checks, or even during normal operation. Installing to the whole disk device is normally more robust.

Check that GRUB actually knows how to read from the device and file system containing /boot/grub. It will not be able to read from encrypted devices, nor from file systems for which support has not yet been added to GRUB.

RAID6 mdraid -> LVM -> EXT4 root with GRUB2?

3 Answers3