So, we have this server. Actually we have lots and lots of them. But I don't care about them right now. I care about this one.
This one is a Linux box. And this Linux box has mirrored raid array.
So one day, one of our techs walks by the server rack and notices a little yellow light. Like a dedicated employee he reports it. Which is good because that little yellow light means something important. It means one of the drives has gone bad and has been removed from the array.
YAY for technology seeing a fault and isolating it!
So, we call up the support company and say "Hey you. Yellow light bad, RMA me a new drive so we can replace it."
You'd think they'd ship one out. But you'd be wrong. What they told us is that we needed to run a diagnostic program on the system to verify that the little yellow light was accurate. We rolled our eyes, but we tried to do as they said.
Of course, they sent us the instructions for a different model of server, so the first attempt to do this failed. But eventually we got it done and sent the logs off thinking that now they would give us a new drive
No. Of course not. I should have known. They told us that the next step for diagnosing the problem would be to rebuild the drive array from scratch and re-image the server.
No no, you read that right. In order to test to see if my raid drive is broke, I have to blow away the server. I ask you... what is the point of having raid if I have to blow away the server in order to test a raid failure? At that point, raid has ceased to be useful. I mean, I suppose it kept the server up until we could schedule an outage, but the failure still caused an outage.
So, we roll our eyes some more and schedule some downtime to rebuild the server.
First step, reboot and log into the BIOS to reset it to factory defaults. And that's where we get the problem. The server no longer understands the bad drive. The raid controller craps out and says that it has an unrecoverable drive error. Server won't boot. Remove the drive with the little yellow light and the server boots up fine.
So, we call up the support company. They're going to ship out a replacement drive.
It took us 10-15 man hours, 2 separate change approvals which involve dozens of groups signing off on our tests and a rather large conference call. This takes about 3 weeks to get everything set up. All to tell us exactly what the little yellow light told us in the first damn place.