You did do that backup…. didn't you?

It seems just a few weeks ago that I posted about making regular backups of your computer. That’s because it was just a few weeks ago! Which makes me wonder whether the server at my office read the post too, and decided to test me.

As well as regular backups the server has two hard disks. They mirror each other in what’s known as a RAID array (Redundant Array of Inexpensive Disks). If one drive should ever fail, the other drive will keep going and users wont notice a thing. Just as well I took the time to set it up as I had an alert email from my server saying:

The following warning/error was logged by the smartd daemon:
Device: /dev/hda, 1 Currently unreadable (pending) sectors

Hmmm, not good. Checking the server over the network revealed the following horrible image:
i-3d21e2b0e0d49016603a96d7e6961a0e-drive-failed.jpg
OK, that may not look like a horror story to you, but believe me when I say that’s worse than any movie company could create on a multi-million pound budget. The key is the MD2… line, which has a (F) in it = Failed disk, and the next line to really push the dagger home [2/1] means only 1 of the two disks is working and the _U is there to make sure you see it. each U represents a disk (one for each disk in my server). the _ (underscore) means that disk isn’t working. It’s a little more complicated than that (the 2 disks are setup as 3 RAID arrays and each array uses two physical partitions, and only one of the partitions on one of the disks has got an error) but we’ll skip the detail here and do this instead:

Arghhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh!

Don’t panic Mr Manwaring! All is not lost because Steve did his homework on this stuff.
Firstly, the server is quite happy to run on one disk. The users wouldn’t even know there was a problem
Secondly, this gives me time to order a new hard disk and plan an evening to install it without interrupting any ones work.

It would be nice for that swap over to be a small job, taking just 30 minutes. However, last time the server disk failed it took 6 hours. This time it has taken me 4 hours – although that includes a few breaks to drink more coffee and have a sandwich. I started at 9pm just after my PHP course finished and I’m sitting here,with my laptop in the server cupboard watching an image from heaven:
i-4c6f001113bd66a37b8e668643c2314b-raid-recovering.jpg
One part of the disk is back to normal, another part is rebuilding while I type, and in a few hours the rebuild process will be complete and once again the business will be protected from a hard disk failure. What a lovely way to start a morning. I think I’ll go to bed now.