5

What is BCP for making the EFI System partition redundant without using hardware RAID?

If I create 3x EFI System partitions on different devices and then backup any changes made to the primary (mounted at /boot/efi) to the backup devices (mounted at /boot/efi-[bc]):

  • Will the system still boot if the primary device fails, i.e. will it select one of the backup EFI system partitions?
  • Will the system select an EFI System partition deterministically when it boots, i.e. must changes to the primary be replicated on the backups before the next reboot?

Is there a better approach such that the system will still boot if the primary device fails?

3 Answers3

6
  1. UEFI specification lacks any knowledge about software RAID. It is known deficiency.

I'd speculate probably it's because it was largely influenced by Microsoft guys who weren't able to create a reliable software RAID array in Windows, and they don't know it is possible to make array out of partitions with simple superblock without special internal structure (Windows only can build arrays out of discs converted to "dynamic" logical disk manager or storage spaces format).

  1. You can make several ESPs on different devices and sync them manually.

For example, if you install Proxmox VE on ZFS "software RAID", it'll create several ESPs, and install special "hook" which runs after kernel, bootloader and other boot-related stuff updates, and that hook makes sure all ESPs are kept in sync.

You can also add a Grub hook to install it on both ESPs, as outlined in this answer:

Create the file /etc/grub.d/90_copy_to_boot_efi2 with the following contents and make it executable:

#!/bin/sh
set -e

if mountpoint -q /boot/efi && mountpoint -q /boot/efi2 ; then rsync -t --recursive --delete /boot/efi/ /boot/efi2/ fi exit 0

Then, whenever you or system runs update-grub it will run this script. Also notice this script is somewhat dangerous when disaster occurs; if primary device containing /boot/efi failed and being replaced this script will erase valid contents from /boot/efi2. So in that case you need to disable it (remove executable bit), or check the source condition in the hook before running rsync.

  1. For the backup ESP to take over if the primary device fails, you should set up UEFI boot entries for all your ESPs. In Linux it's done like this:
efibootmgr -c -d /dev/sdb -l \\EFI\\DEBIAN\\GRUBX64.EFI -L debian-sdb
efibootmgr -c -d /dev/sdc -l \\EFI\\DEBIAN\\GRUBX64.EFI -L debian-sdc
efibootmgr -c -d /dev/sdd -l \\EFI\\DEBIAN\\GRUBX64.EFI -L debian-sdd
efibootmgr -c -d /dev/sda -l \\EFI\\DEBIAN\\GRUBX64.EFI -L debian-sda

This is the real example from one of my managed systems. It assumes ESPs are first partitions of each disk. This should be done after you synced contents of your ESPs. efibootmgr -v will confirm that all boot entries that you create like this use different devices.

See also: https://askubuntu.com/questions/66637/can-the-efi-system-partition-be-raided

0

The contents of the EFI partition should be relatively stable, so manually cloning changes to other copies on other disks after updates should be fine. And even if changes are not cloned, old copies might be OK as long as they are not too many revisions behind.

Will the system boot off of an alternate EFI? That's a harder question. Most modern bios versions do support multiple boot devices and may try them all in sequence until one works. So then you just have to make sure they are all there and in the correct order. You may need to manually run the linux command to update the EFI bootloader list and order.

However, it might be better to not have it autoboot on failure. If the primary EFI disk fails, you may want to manually boot and attempt repairs anyway. But having the backup EFI even if it isn't in the boot order should make recovery a lot easier.

An alternate viewpoint -- if a disk in a raid system is going to fail, it is likely to fail when the system is up. If you detect this condition before the next boot, you can easily activate one of your backup EFI copies (and maybe even make it primary) until the failed disk is replaced.

user10489
  • 669
0

Each system searches for EFI partitions on the specified boot device. As long as these are updated correctly it should be able to boot then. Beside of this a distributed boot manager is a different story of course.

I have created a gist to see a setup in a systemd environment which would sync all these partitions on system shutdown:

https://gist.github.com/thhart/35f6e4e715c70c2cbe7c5846311d1f9f

Thomas
  • 189