RAID 0+1 failure!!

edited August 2004 in Hardware
First, the history:

I built this computer last year in May. My older machine had a Fujitsu drive which failed and I had been meaning to upgrade for soe time, so I built a new one. I based it on an Asus A7N8X-deluxe mainboard. Since SATA drives were pretty expensive at the time, I went with two Maxtor DiamondMax 9 80GB drives in a RAID 0 (striped) array using a Silicon Image 680RAID controller.

The first week in November, one drive had an "event" whichmade me think it had failed. After churning through ChkDsk for 12 hours, the error was clearwed, but not before I was frightened into buying two more drives and setting up a mirror. Unfortunately, the Silicon Image software did not allow me to reconfigure my existing array to acept a mirror, so I reformetted and reinstalled everything.

In January, the slave drive on the secondary channel started acting dodgy. It would drop from the array every few hours, then spend a long time rebuilding, only to drop again in an hour or two. After a few days, it disappeared from the system completely and the secondary master drive started acting dodgy. I pulled the secondary slave drive and everything seemed to be fine (except for only having three drives in a four-drive RAID 0+1 array).

The dodgy drive went into a second system in a non-RAID configuration, where it lives to this day. It has had no hard or soft errors in three months. It is powered up 24/7.

In March, I received a new drive to replace the secondary slave. I instaled it on the controller card, and found that the software would not allow me to add this new drive to the existing array. (This seemed kind of odd to me, since that is exactly why one would HAVE a mirrored set...).

After a few days of running with the fourth drive configured as an independant drive, I received a call from my girlfriend and was told that my computer was "making a clicking sound." I had her do a soft reboot, but it did not respond. So I told her to hit the reset button. When I came home, I was greeted by a notice
SYSTEM DISK NOT FOUND, PRESS ANY KEY TO REBOOT

I disconnected the (new) fourth drive, hoping that putting the system back into the original (working) state would allow the array to be found. Instead of finding a working RAID, the controller detected all three drives, but would not put them together correctly. I reconfigured the array hoping that it would "pick up" the old data, but it did not. I reinstalled the fourth drive and did a complete rebuild. That was three days ago.

Today, the array is not working again. The secondary channel is "gone" (no drives detected) and the two primary drives are an "invalid RAID set" (even though they -should- make up a complete striped set).

I'm guessing (guessing mind you) that the Silicon Image controller simply can't handle having four drives attached without exploding. I have only had problems when all four drives are in service, and always the secondary slave drive (even when it is a different -physcal- drive).

Now, the questions:

1. Is anyone aware of any "issues" with the Silicon Image 680 chipset similar to what I'm seeing?

2. Can anyone reccomend a good RAID 0+1 controller? I'm interested in a fast controller, but if I have to give up a little bit of speed to gain some reliability, I consider that a worthwhile exchange. Cost is not really a concern, within reason; i've already got $500-600 invested in drives, so as long as the controller is less than it would cost to repace the drives, I'm still ahead.

3. Is there such a thing as a slow-speed, external backup unit that could read the rADI array when I am not using the machine (for example, in the middle of the day) so that should the redundant array "go away" I will at the very least retain my data? If so, how costly are these?

4. Am I being silly for looking at double redundancy?

Thanks for any input from you all.

Comments

  • edited August 2004
    Hey Stupid... not sure how you resolved this, but I'd love to know.

    I setup a simple mirrored set using the PM & SM channels using the Sil 680 RAID card. Same thing that happened to you happened to me. The SM drive started acting dodgy and then one day upon bootup, the set appeared as an "invalid raid set". WTF? The set contained the OS. So, hoping I could get the system to boot, I plugged the PM (working drive) into the regular motherboard's IDE port and it wouldn't boot from that

    How did you resolve the "invalid raid set" failure?

    Thanks, slappy.
  • edited August 2004
    Slappy:

    First off, let me address your problem.

    It sounds as if you had a striped set with two drives. If this was your setup and one drive has failed, I'm sorry, but the data is unrecoverable.

    Some people claim that if you restart the system and reset the two drives to an identical configuration that it will "pick up" the old striped data and possibly allow you to read it to a new drive before replacing the failed parts. My experience with the SiI-680R controller was that this will not work. When you re-configure, it does something to make the data impossible to recover, even with a cluster scan for deleted files. I tried.


    As for what I did to resolve my own issue...

    After several calls to the Silicon Image tech people, it turns out that the SiI BIOS is not capable of:
    1. Taking an existing RAID 0 (striped) set and adding an additional mirror, to create a RAID 0+1 set. The only way to add new drives to the set is to re-configure, which means that the drives are gone and all of the data has to be re-copied.
    2. Recovering from a "hard" error that requires replacing a single drive in a RAID 0+1 (stipped/mirrored) set. The only way to get the new drive to be part of the set is to reconfigure all four drives, which means that the drives are gone and all of the data has to be re-copied.

    So, overall, the SiI-680R controller is your basic non-fault-tolerant RAID controller. As long as things are running smoothly, everything runs smoothly. But the -instant- that something goes wrong, you lose everything in a completely unrecoverable way. For a home system where the RAID is probably the primary (and possibly the only) fixed storage in the system, this is unacceptable. After all, the whole point of RAID is to add redundancy and increase reliability. The SiI-680R seems to go completely contrary to that paradigm.

    The "failed" drive is still working fine in the second system. I suspect that it was overheating in the RAID system. I replaced the drive with an identical unit, but moved it to a different drive bay and it has been working happily ever since.

    I pulled the SiI-680R controller and threw it away. I replaced it with a Promise FastTrax 4400. In the interim, I have had two "soft" drive failures due to power source "events" and one "hard" drive failure due to an actual physical crash. In all cases (even when one drive was phyically replaced with a new different drive of the same size) the controller was able to quickly rebuild the array without any user intervention.

    My advice is to return the SiI-680R (if you can) and get a Promise controller. They're more expensive, but the money is well spent.
  • edited August 2004
    Thanks for your reply,
    My set was definitely a mirrored set, not striped. For some reason, the card stopped recognizing the set, perhaps triggered by a fault in the secondary drive, but I can't be sure.

    So, but the sounds of your reply, my data on either of the existing once-mirrored drives is lost due to this "invalid raid set" problem.

    * Is there any way to recover the data? Is there someone that I can pay to yank the data off of the drive?
    * If I put a new HDD in the machine running a new install of the OS, will the new OS recognize the existing mirroroed drives? Did this RAID fault only mess-up the boot sector, or what?

    Thanks
  • edited August 2004
    I can't provide a definitive answer since I'm not a Silicon image technician. I would suggest you give them a call and ask them how to recover from a hard failure of a single drive in a mirrored set.

    My personal experiecne was that a failure of one drive in the mirror led to an unrecoverable error. To me, that runs contrary to the entire concept of running a mirrored set, so I dumped the Silicon Image controller for one that actually -did- support recovery of a damaged mirror set.

    Silicon Image can be reached at 408-616-4000. Ask for "enginerring support" and they will get you to a technician. Since SiI manufactures chipsets, not boards, they don't provide support for controller cards.

    Good luck!
Sign In or Register to comment.