Tools: The mdadm backup file: when it's needed, and when it isn't (2026)
Chunks and stripes explained
Reshaping: under the hood
sync_speed_max
Reshaping from RAID5 to RAID6
Things I'd do differently in production
What did we learn? I'm planning a home storage build, and one thing I want to understand before I commit is how to grow a RAID5 array safely. Here's what I've learned. For demonstration purposes I'm going to do it in VirtualBox. You can say it’s bottlenecked intentionally for demonstration purposes, and it’s at least partially true. This is how I'm going to reach my starting point from zero: the 1MiB partition alignment matters for RAID performance, make sure your partitions align to mebibyte boundary. I'm doing this on VirtualBox, all the affected virtual hard drives are the same capacity All right. Now that I have my RAID array, and my precious data, and I need more space for more data, I'm going to expand my RAID array. Here's how I'm going to do it: at this point the new hdd is added as a spare drive, here's how the current state can be checked: to actually grow I'm going to issue this command: and then we can check progress by issuing: The fs was mounted and usable all along, however it didn't get resized, it will still report the old capacity (256MiB in my case): this is how to resize it (works while still mounted as well): at this point df /mnt will report the capacity being increased. A chunk is the least amount of data considered by RAID. Chunks on the disks are aligned at multiples of the chunk size. A stripe is all the data represented by the chunks at the same offset of all the participating drives. First of all, let's remember: we are reshaping a RAID array while the underlying fs is still mounted, applications can still read/write and the operation happening below should be invisible (apart from latency). Now. When we're in the middle of the reshaping there will be a portion of the array where the reshaping has already happened, and therefore can be read/written with the new geometry in mind. This area is before suspend_lo as per the official documentation. Also, there's a portion which hasn't been reshaped yet, and this portion of the disk (above suspend_hi) is safe to be read/written as per the old geometry. If the array is accessed in the in-between range, the operation will be blocked until the reshaping is done for the relevant portion of the array. All that will cause is some latency in IO. So overall, you're free to use your array during the reshaping. Technically there's a third value at play here as well: reshape position telling exactly up until which point in the array the new geometry should be considered. This value obviously cannot be outside the range discussed earlier. (In case of RAID0 slightly different mechanics are used, meaning the suspend_ values are omitted completely.) The aforementioned three values can be observed reading the appropriate file, e.g. /sys/block/md0/md/suspend_lo So we're happy users of a RAID array currently being reshaped, and confidently so. As we move forward in time, on the old disks the gap between the ready to use new region and the yet to be reshaped region widens, the space between is fair game for anything that needs to be stored temporarily. But early during the reshaping process chunks to be written and chunks to be read were overlapping. In this critical section of the reshaping process it's essential to have some space for backup data, it either goes into a separate file, or into the newly added disk, therefore backup files were invented. Which are not always files, but anyhow we want to store some backup data because of the possibility of a system crash and because we don't want to use a considerable portion of the IO bandwidth to update metadata. In essence early during reshape in case of a system crash there would be an uncertainty whether we have the old data in place, or the new one. This uncertainty is cleared up by the backup file: in case of a system crash we need to be able to restore to a known state and resume reshaping from there. That's what the backup data is for. But in the scenario I'm in, something even more clever happened: this is what I've seen mid-reshape: and all the disks reported the same. Meaning the actual data got shifted by 1.5MiB on all disks, meaning even early during RAID reshaping there was no critical section in the sense there was no portion of the RAID array that should have been claimed by both the old and new geometry at the same time. After this first grow, the data offset had shifted from 4096 to 1024 sectors. There wasn't enough remaining headroom at the start of the disks to do the same trick again. So when I added another drive to see what would happen, mdadm said Need to backup 6144K of critical section. There was no backup file necessary, after all, we have a completely empty drive at hand, so no surprises here. It's worth noting that adding a new drive like this increases the probability of a drive failure just because there are more disks. One small thing I stumbled upon during writing this article is the following: this limits the syncing to 10 kibibytes per second. Choosing such a small number only makes sense if you want to have enough time to inspect a small array during reshaping. Losing two disks is less probable than losing one, and RAID6 protects against two disk failures, so now we're reshaping from RAID5 to RAID6 by adding a new drive. So we're going from RAID5 with four drives to RAID6 with five drives, more redundancy, usable capacity won't change: ok, so I was surprised that it works like this and does not require a separate backup file, however it makes sense: I'm adding a new currently empty drive, also, as we've seen, data offset can give us some free space here or there as well, this is what I've seen mid-reshape: the unused space before (or after) the real data could be used for the reshaping as well.
But I'm not satisfied, I want to see the backup file, hopefully changing the chunk size would require me to use a backup file, let's try it: the backup file contains a safe backup/resume point from where the system is able to continue reshaping. That means: the file should be available after reboot without md0 being online. My /root directory here survives reboots, and is outside md0, so it's a safe place for the backup.
When examining the drives you'll see something like this: after the reshape is done, you'll see: ok so I wanted to see an error message saying the backup file is required, so I issued which was surprisingly accepted, however was very slow compared to the previous one, because of the lack of the backup file. I redid the whole experiment with and without a backup file. Without one, the reshape stalled while online, but completed offline. mdadm --assemble --scan took ages, the array was unavailable during that period, but mdstat reported progress. Even so, the no-backup-file path was significantly slower than the with-backup-file path. But I still want to push a bit further, maybe removing a device will make it require a backup file (in theory if you remove a drive, that means the corresponding space at the end of md0 is empty, meaning that the space at the end of the removed drive corresponding to the empty portion of the md0 would be free to use. I'm going to figure out what happens as I write this) now at this step the ext4 fs was larger than the theoretical capacity of the reduced array, which I don't understand, given we started with 2+1 drives, and we're reducing to 2+2 drives, maybe resize2fs made metadata to grow, and now it doesn't want to shrink it back, but anyhow I just recreated the test data with smaller size the error message is not particularly user friendly, however reflects the internal working: mdadm tries to make room by making an offset of 4608, which then deemed invalid. Anyhow, the issue can be fixed by providing a backup file: so in this case we know for sure the backup file is used this time, and we couldn't have gotten away without it. The backup file is safe to delete when the reshaping is done (as reported by cat /proc/mdstat). Given that I worked with virtual HDDs there was no point in doing SMART tests, but with real physical hardware it would be wise to do a full SMART test on each drive before doing anything. Also, given the fact that reshaping took place in minutes after creation, I was confident that the array has no latent issues yet. In production, you'd rarely reshape an array minutes after its creation. I'd want to scrub after the SMART tests are done. If there's anything surfacing during SMART tests or scrubbing, the array is still in a known good state, much easier to deal with it now than during (or after) reshaping. We learned that in modern linux the backup file is superfluous in most of the cases. We've also seen a case where the lack of backup file was accepted, but reshaping stalled when the array was online. Therefore we learned that even if it might be unnecessary, having that file won't hurt. We learned that mdadm in some cases is able to do some clever tricks to avoid the necessity of backing up at all, in other cases the backup data is hidden away in unused portions of the underlying partitions. And we've seen that during shrinking the presence of a backup file is strictly necessary. Overall we've seen how mdadm makes reshape crash-safe across a variety of cases, sometimes via the backup file, sometimes by cleverer mechanisms that avoid needing one. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse