I have a "stable" core 10 server that is hosting multiple visualizations:
Setup details...
The host is on a dedicated non raid drive.
The virtualization for each system was created in one of two ways... mainly because I was learning when I set this up a year ago.
The first way was that I created the filesystem as a .img on the boot drive then I added access to a raid5 array as an additional mount point in the booted .img file. This has worked fine for a year.
The second way was introduced when I added a raid1 array for additional "testing" systems. I partitioned the raid1 like this:
Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdg1 1 95296 765465088+ fd Linux raid autodetect
/dev/sdg2 95297 100780 44050230 fd Linux raid autodetect
/dev/sdg3 100781 106264 44050230 fd Linux raid autodetect
/dev/sdg4 106265 121601 123194452+ 5 Extended
/dev/sdg5 106265 110573 34612011 fd Linux raid autodetect
/dev/sdg6 110574 114882 34612011 fd Linux raid autodetect
/dev/sdg7 114883 121601 53970336 fd Linux raid autodetect
Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdh1 1 95296 765465088+ fd Linux raid autodetect
/dev/sdh2 95297 100780 44050230 fd Linux raid autodetect
/dev/sdh3 100781 106264 44050230 fd Linux raid autodetect
/dev/sdh4 106265 121601 123194452+ 5 Extended
/dev/sdh5 106265 110573 34612011 fd Linux raid autodetect
/dev/sdh6 110574 114882 34612011 fd Linux raid autodetect
/dev/sdh7 114883 121601 53970336 fd Linux raid autodetect
md7 : active raid1 sdg7[0] sdh7[1]
53970240 blocks [2/2] [UU]
md6 : active raid1 sdg6[0] sdh6[1]
34611904 blocks [2/2] [UU]
md5 : active raid1 sdg5[0] sdh5[1]
34611904 blocks [2/2] [UU]
md4 : active raid1 sdg3[0] sdh3[1]
44050112 blocks [2/2] [UU]
md2 : active raid1 sdg2[0] sdh2[1]
44050112 blocks [2/2] [UU]
md1 : active raid1 sdg1[0] sdh1[1]
765465024 blocks [2/2] [UU]
I then proceeded to add /dev/md1 as a mount point to the original .img for additional storage... works fine.
I then created a new system using /dev/md2 this is where things change up as I installed Core12 system directly onto the /dev/md2
I then created a system on /dev/md5 and /dev/md6 installing Windows XP PRO ... I am using these systems as testing environments that I can access using remote console... they appear to work fine.
Then I create a system on /dev/md7 and install Windows 7... again all is well.
The issue comes after 6-120 hours and it is with the /dev/md2 system (Fedora Core12) install. It becomes unstable. Originally it crashed the raid which I attributed to a incompatible Adaptec raid card and the Western Digital drives. I confirmed this with Adaptec support. I replaced the card with a Startech Card and recreated the raid from scratch to insure there were no vestiges of the original problem. Thought all was resolved but today I tried to ssh the /dev/md2 system and ssh apparently went down. However the system was still running but not accessible via the console to restart the services. I could however shutdown the system using the virtual machine manager. On restart all is fine for a bit then the same thing happens.
So I assume it is a corrupted system. The first think I tried was to run fsck -fy on the /dev/md2 from the host systems console. This starts a flurry of corrections that runs for hours that look something like this:
Inode 67551 is in use, but has dtime set. Fix? yes
Inode 67551 has imagic flag set. Clear? yes
Inode 67551 has a extra size (8226) which is invalid
Fix? yes
Inode 67552 is in use, but has dtime set. Fix? yes
Inode 67552 has imagic flag set. Clear? yes
Inode 67552 has a extra size (24938) which is invalid
Fix? yes
Special (device/socket/fifo) inode 67552 has non-zero size. Fix? yes
Inode 67553 is in use, but has dtime set. Fix? yes
Inode 67553 has a extra size (24938) which is invalid
Fix? yes
Inode 67554 is in use, but has dtime set. Fix? yes
Inode 67554 has imagic flag set. Clear? yes
Inode 67554 has a extra size (8226) which is invalid
Fix? yes
Inode 67555 has EXTENTS_FL flag set on filesystem without extents support.
Clear? yes
Inode 67726 has illegal block(s). Clear? yes
Illegal block #0 (791555631) in inode 67726. CLEARED.
Illegal block #1 (774843950) in inode 67726. CLEARED.
Illegal block #2 (1634348846) in inode 67726. CLEARED.
Illegal block #3 (796418422) in inode 67726. CLEARED.
Illegal block #4 (1852405619) in inode 67726. CLEARED.
Illegal block #5 (1665216359) in inode 67726. CLEARED.
Illegal block #6 (1852795252) in inode 67726. CLEARED.
Illegal block #7 (1836345390) in inode 67726. CLEARED.
Illegal block #8 (1297294188) in inode 67726. CLEARED.
Illegal block #9 (1598835777) in inode 67726. CLEARED.
Illegal block #10 (1313817417) in inode 67726. CLEARED.
Too many illegal blocks in inode 67726.
This goes on and on and on.... the outcome is always terminal as the filesystem is ultimately hosed. The raid device however does not crash as before so I am sure that the original compatibility issue is resolved.
I am thinking that It is one or several issues of my own making:
1. You can't run a newer version of Fedora on an older host system.... This seems unlikely to me but would like to confirm this with someone that knows.
2. Running fsck on a raid device is a bone head move as the virtual file system is created in a way that fsck is not expecting to find it... This seems probable to me. So what or how should I check the file system... remember that my other virtualization-s were .img so when I ran fsck on the running raid mount points, I was only checking a simple mount point and not a virtually partitioned file system.
3. Having a mixed OS environment (Windows, Linux, and a simple mount point) is a big mistake?
I could really benefit from some sage advice from a experienced server ninja... Thank you.
---------- Post added at 12:36 PM CDT ---------- Previous post was at 12:17 PM CDT ----------
The host system =
# cat /proc/version
Linux version 2.6.27.41-170.2.117.fc10.x86_64 (mockbuild@x86-4.fedora.phx.redhat.com) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP Thu Dec 10 10:36:29 EST 2009