59

I want to interrupt a running resync operation on a debian squeeze software raid. (This is the regular scheduled compare resync. The raid array is still clean in such a case. Do not confuse this with a rebuild after a disk failed and was replaced.)

How to stop this scheduled resync operation while it is running? Another raid array is "resync pending", because they all get checked on the same day (sunday night) one after another. I want a complete stop of this sunday night resyncing.

[Edit: sudo kill -9 1010 doesn't stop it, 1010 is the PID of the md2_resync process]

I would also like to know how I can control the intervals between resyncs and the remainig time till the next one.

[Edit2: What I did now was to make the resync go very slow, so it does not disturb anymore:

sudo sysctl -w dev.raid.speed_limit_max=1000

taken from http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html

During the night I will set it back to a high value, so the resync can terminate.

This workaround is fine for most situations, nonetheless it would be interesting to know if what I asked is possible. For example it does not seem to be possible to grow an array, while it is resyncing or resyncing "pending"]

Adam5
  • 591

10 Answers10

55

If your array is md0 then echo "idle" > /sys/block/md0/md/sync_action

'idle' will stop an active resync/recovery etc. There is no guarantee that another resync/recovery may not be automatically started again, though some event will be needed to trigger this.

http://www.mjmwired.net/kernel/Documentation/md.txt#477

Mark Wagner
  • 18,428
44

I wanted to slow down or pause the resync process to save some I/O to backup some stuff on another computer. This thread helped me but I found another solution.

On my Debian Lenny :

  • echo "idle" > /sys/block/md0/md/sync_action works but the resync process is immediately restarted.

  • checkarray -x --all : works, but same result: the resync process is immediately restarted.

So I use this method: echo 0 > /proc/sys/dev/raid/speed_limit_max

small
  • 541
27

You can cancel an array resync in progress using the following sequence of commands (as root):

echo frozen > /sys/block/md0/md/sync_action
echo none > /sys/block/md0/md/resync_start
echo idle > /sys/block/md0/md/sync_action

Note that this may leave your array in an inconsistent state. Don't do this unless you're sure the array is in good shape, and rerun the sync later.

(Credit where credit's due: found this incantation in this thread.)

11

As mentioned above, on Debian/Ubuntu systems the /etc/cron.d/mdadm script invokes the /usr/share/mdadm/checkarray script to initiate re-sync checks.

This script has an option for cancelling all running sync checks:

/usr/share/mdadm/checkarray -x --all
sanmai
  • 561
9

Possible solution for this, took a bit to get into the details.

My system: CentOS 6.5 mdadm v3.3.2

Constant checks every week, wanted to pause one of them, RAID is clean, check was called via the /etc/cron.d/raid-check script which is run weekly.

To cancel the check, you use the --misc --action function. Assuming the RAID device is /dev/md0 and this is just the weekly consistency check and not a device failure, you would, as root:

mdadm --misc --action=idle /dev/md0

Likewise, to start the consistency check

mdadm --misc --action=check /dev/md0

3

Not sure about how to cancel a re-sync, but the schedule is controlled by /etc/cron.d/mdadm on Debian/Ubuntu systems.

The script /usr/share/mdadm/checkarray may shed some light on the other part of your question, since that is what is being called by cron.

Zoredache
  • 133,737
3
echo "idle" > /sys/block/md0/md/sync_action

Does not work when /sys/block/md*/md/sync_action is "resync" (unlike if its state is "check" or "repair". You can echo "idle" into the sync_action file, however it does not effect the progress. This kernel documentation file here incorrectly states that it will work, but it has never worked for me:

'idle' will stop an active resync/recovery etc. There is no guarantee that another resync/recovery may not be automatically started again, though some event will be needed to trigger this.

Sven
  • 100,763
brian
  • 31
2

If your md device is md0 and you want to stop the resync write:

echo "idle" > /sys/block/md0/md/sync_action
mgorven
  • 31,399
Victor
  • 29
1

I tried the answer from "@bill.rookard": "mdadm --misc --action=idle /dev/md0" to stop the running recovery process BUT the recovery process did not stop (or maybe stopped and immediately restarted).

Then I checked the "mdadm" manpage:

--action=... : Set the "sync_action" for all md devices given to one of idle, frozen, check, repair. Setting to idle will abort any currently running action though some actions will automatically restart. Setting to frozen will abort any current action and ensure no other action starts automatically.

And finally running: "mdadm --misc --action=frozen /dev/md0" stopped the recovery. I could reboot and perform server maintenance. And once back online running: "mdadm --misc --action=check /dev/md0" continued the recovery process where it was left off. All fine.

becke-ch
  • 111
0

I know this is a 4 years old post but you can also do this (assuming md0 as the array and sdb4 as the resyncing "disk"):

    mdadm /dev/md0 --fail /dev/sdb4 && mdadm /dev/md0 --remove /dev/sdb4

This command pretends sdb4 to be a failed disk and therefore kicks it from the array, stopping the resync. If there was no error during the resync-stop action then this command will also remove sdb4 from the md0 array. If there was any error then the disk stays in failed state but remains in the array.

If you fail a disk anywhere in mdadm, you set it logically failed. If the array was clean (not degraded) then the disk stays consistent and can be re-added by the --add << disk >> --assume-clean option without any fear. If there was any action after it was detached (eg. resync, rebuild, or even a write) then --assume-clean will probalby fail and start a resync action immediately.

Changing raid.speed_limit_min and raid.speed_limit_max is somehow a bad idea because it affects not only resync/rebuild speeds but also the normal operation speeds, and probably you will lose a lot of performance gained by using RAID arrays.

eth
  • 25
  • 1