Hi all.
Today I noticed various files I'd saved in the last four or five days have vanished from my RAID1 /home.
A quick look for the device names in /var/log/messages reveals:
Code:
Jun 10 08:26:55 gareth-desktop kernel: [ 21.679114] md: bind<sda6>
Jun 10 08:26:55 gareth-desktop kernel: [ 21.679768] md: kicking non-fresh sdb4 from array!
This has occurred on several previous boots, so new files saved to sda6 weren't synced to sdb4. Nice of Fedora to bother telling me that... However, /proc/mdstat and /var/log/messages indicate that as of this afternoon, only sdb4 is being used instead, so everything saved in the last few days is missing. I could re-add sda6 and let the array rebuild, but I want to ensure that sda6 is used as the source rather than sdb4, which is currently live, or else mount it separately and copy the newer files before.
Anyone have any ideas? I'm not used to using mdadm directly, so I want to tread carefully. I have already removed sda6 from the array, in preparation for re-adding it, before the realization of what that would do occurred.
---------- Post added at 10:04 PM ---------- Previous post was at 10:03 PM ----------
Just to add, the disks are both in good condition according to SMART, and my RAID0 / is working just fine. This is the kind of thing I wanted to avoid by using btrfs-RAID instead, but Anaconda let me down badly...
---------- Post added at 10:56 PM ---------- Previous post was at 10:04 PM ----------
Also, the array is encrypted, but the encryption is on top of the RAID. I've tried cryptsetup luksOpen on sda6, but it says it's not a valid LUKS device (it gives the same error for sdb4 though, so I assume it's a md-raid header block that's confusing it).
---------- Post added at 11:48 PM ---------- Previous post was at 10:56 PM ----------
For anyone else who finds themselves in the situation of having their RAID1 array silently split into two competing instances of /home, here's how merge them before rebuilding a single /home array:
Code:
# Create a new array with the currently unused device.
[root@gareth-desktop tmp]# mdadm --create /dev/md2 --level=1 --raid-devices=2 --assume-clean /dev/sda6 missing
mdadm: /dev/sda6 appears to be part of a raid array:
level=raid1 devices=2 ctime=Mon Jun 4 21:51:32 2012
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device. If you plan to
store '/boot' on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
--metadata=0.90
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md2 started.
# Decrypt (if necessary) and mount the broken mirror.
[root@gareth-desktop tmp]# cryptsetup luksOpen /dev/md2 oldhome
Enter passphrase for /dev/md2:
[root@gareth-desktop tmp]# mount /dev/mapper/oldhome /mnt
# Find where things have diverged. This is currently running.
# I suppose something like rsync could make this faster, but I'd rather see what's going on first.
[root@gareth-desktop tmp]# diff -r --exclude '.[!.]*' /home /mnt > diff.txt
Next question: why do these kind of things only become visible on a Friday night, when you're off for a week away the next day and need to take the missing files with you?

---------- Post added 16th June 2012 at 02:10 AM ---------- Previous post was 15th June 2012 at 11:48 PM ----------
Finally, once you've merged the files so that all the versions that you want to keep are in /home:
Code:
# Check RAID status.
[root@gareth-desktop mapper]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md2 : active raid1 sda6[0]
955788863 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sdb4[1]
955788863 blocks super 1.2 [2/1] [_U]
bitmap: 6/8 pages [24KB], 65536KB chunk
md0 : active raid0 sda3[0] sdb1[1]
20969472 blocks super 1.2 512k chunks
unused devices: <none>
# Stop the spare RAID recovery device.
[root@gareth-desktop mapper]# umount /mnt
[root@gareth-desktop mapper]# cryptsetup luksClose /dev/mapper/oldhome
[root@gareth-desktop mapper]# mdadm --stop /dev/md2
mdadm: stopped /dev/md2
# Restore the unused device into the /home RAID.
[root@gareth-desktop tmp]# mdadm /dev/md1 --add /dev/sda6
mdadm: added /dev/sda6
[root@gareth-desktop tmp]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md1 : active raid1 sda6[2] sdb4[1]
955788863 blocks super 1.2 [2/1] [_U]
[>....................] recovery = 0.0% (347072/955788863) finish=229.3min speed=69414K/sec
bitmap: 6/8 pages [24KB], 65536KB chunk
md0 : active raid0 sda3[0] sdb1[1]
20969472 blocks super 1.2 512k chunks
unused devices: <none>
Now it will sync so that the current /home file-system is mirrored. I really hope that this doesn't happen again, or Fedora at least bothers to tell me when RAID decides to randomly change devices...