Raid5 Problem
Hi,
I had a Raid 5 setup with 4 200GB Maxtor drives. Last week one of them broke and im still waiting for a replacement. Yesterday the computer got really slow and i ran chkdisk. Because it took so long i went to bed and when i looked at it this morning it was stuck at "checking free space" at about 35%. I hit the reset switch and the Promise S-150 SX4 controller reported the array as off-line. Now there is an array defined but it reports two missing drives and one as free. I dont think there was any data lost on the drives, but i need the controller to recognize the 3rd drive. What can i do? There was a lot of data on it and it really would suck losing it all.
Thx for your help!
Jan
I had a Raid 5 setup with 4 200GB Maxtor drives. Last week one of them broke and im still waiting for a replacement. Yesterday the computer got really slow and i ran chkdisk. Because it took so long i went to bed and when i looked at it this morning it was stuck at "checking free space" at about 35%. I hit the reset switch and the Promise S-150 SX4 controller reported the array as off-line. Now there is an array defined but it reports two missing drives and one as free. I dont think there was any data lost on the drives, but i need the controller to recognize the 3rd drive. What can i do? There was a lot of data on it and it really would suck losing it all.
Thx for your help!
Jan
0
Comments
rebooting while its running in that mode was very bad. I personaly would never run raid-5 on a promise controller as they are not known for their robust raid-5 solutions.
go email promise tech support RIGHT NOW and ask how to proceed. There are very few of those controllers in teh field and probably fewer running raid-5 and if you want GOOD advice and not guesses considering that the data is important go to Promise for the answers on how to proceed.
Raid-5 systems I have messed with (I am not a fan btw...) require the failed drive to be replaced to retain the array. If you were running a hot spare and those were big full ide drives it could take 24 hours or more to rebuild the raid-5 when you replace the drive. Until its replaced its running in a waaaaayyyyy reduced performance mode. State of the art scsi is a differant picture. The drives are smaller and rebuild faster. As far as I am concerned you should not use raid-5 without an extra hotswap spare with scsi/ide or sata. Its just wrong on many levels.
Tex
I will try to contact Promise then.
Tex you can probably give me the best advice what to do now. Basically i have these 4 200GB Maxtor SATA drives and a ****ty Promise controller, since you´re saying raid 5 is not the best choice, what is? And what controller should i choose?
Thank you a lot!
Jan
While you're at it, if you have the cash (these aren't cheap), this would be a good time to get a nice caching RAID controller to improve performance.
Im a fan of RAID5 if it's done right. The hot spare idea is essential. Waiting for another drive to arrive is bad and seriously runs the risk of the data loss you have experienced. I work in enterprise. If a RAID5 loses a disk, we have a hot spare and a cupboard full of spares!
What can I suggest/bring to the table? Ditch the IDE's, get some SATA drives and a 3ware controller card with a dedicated processor & cache memory. It offloads the parity calculations from the CPU and onto the card. It also brings "hot swap" capability. Just make sure you have a local dealer with available kit incase one spins off it's coil.. or a spare in a drawer!
Incidently, the cache memory also makes a monster difference. If your budget can dictate it, get SCSI! RAID-5 will still crush it hard but SCSI can take the strain I wish I could afford it!
Im sure soothsayers will read this and say "oh that's rubbish, I run RAID-5 on 4 laptop drives blah blah blah". Maybe you do but when the thread starters situation hits you, it's not pretty.
I looked at the 3ware Escalade 8506-4LP controller but its like 300 Euros and I´m not really sure if i want to spend that kind of money yet...
I set up a Raid 0+1 now because I really need the redundancy, and I´ll check my budget for the 3ware
Raid 0+1 or even a raid-0 you backed up to non raided drives would be my options.
Remember even redundant raid like raid-1 or raid 0+1 or raid-5 does not mean you can escape from backing up. (as you just learned) It only protects from a complete drive failure. A controller failure... Or a corrupted FS... or you delete crap by mistake or... Well geeez the list goes on for ever...
None of those are protected with ANY level of raid.
You need to develop a BACKUP routine. Run raid-0 if you want but BACK IT UP to a non raided drive or even another raid-0 array on a differant raid controller is the safest bet.
My servers in the house here back themselves up to other servers across my gigabit lan nightly so my data always exists on at LEAST two seperate computers nightly and crap like our outlook mail folders back themselves up several times a day to multiple machines as we would both sheet if we lost emails.
Run a pair in raid-0 for the performance and BACK IT UP often to non raided drives is my recomendation.
I am not not just taking out my butt. I do this for a living and this is what I do to protect both my own data and my customers also.
Tex
Il echo that every word. Backup, backup and backup again.
Get Acronis True Image. It supports a fullbackup & incremental (point in time) appended backups. I use that to backup my server network at home. Each machine runs it and backups over the network to another. I know it's a pain when you have masses of data to have to shift around but when you do a big buy of kit, prepare it in
The 3ware controllers are expensive but well worth it. They take the load off the CPU and software RAID (especially RAID-5) is like jelly. That's why I don't use it at home (yet anyway). Investment is high when moving to a serious storage solution but once it's in place, you can relax a little
Tex
I can remember that the writes were pretty low, not higher than 40,000 in Atto no matter what latency I tried and the reads were about 80,000 max. In general not very fast, but acceptable to me if i get the safety of raid 5. Can you estimate the performance of the 3ware controller in a Raid5 with the 7200 rpm 200GB maxtor drives?
I guess my array works now, has been stable for the last couple of days, so I will keep this controller for now. I will get a PCI-Express one when I will do a complete Rebuild once I saved up some money and my current system won't run everything anymore.
3ware pwns, I have the same controller your talking about. I also use a lot, and I mean a CASE of 3ware 8006-2lp controllers a week.
Acronis pwns. Ive restored partitions that were 16gig to 8 gig partitions and vice versa, acronis does not even flinch while doing it. We use the enterprise version.
So you are using the 3ware in a Raid 5 config with 4 Drives?
Can you please post up an ATTO screen for me?
Thanks a lot.
They are much more robust and dependable and thats really what raid-5 is all about.
I wouldn't touch raid-5 on a promise controller for all the tea in china. I would only even consider it on a 3ware for IDE raid-5.
Consider the Promise as low end toys for the enthusiast and the 3ware is a stable mature robust answer designed for mission critical applications used by the buisness world.
How can I put this?
You own the equivalent of a hopped up Kia. Gobbles uses a Mercedes.
Tex
http://www.lsilogic.com/products/megaraid/index.html
A little more money, but they're the real thing!!
Flint
As for backups, we're using a Dell LTO2 tape unit (200GB native) with Veritas Backup Exec. I was thinking of getting an autoloader, but I need to take the previous day's backup offsite.
Depending on the controller and the amount of ram on the controller, as Tex reminded me, Atto's I/O test won't even touch the disks. It sits in the onboard ram on my controller and there really is no disk activity in the bench at all. When I had a slower controller with less ram on a slower bus, the same thing occurred. My new controller is on a PCI-X bus and I still can't bury the controller in I/O's. YET!
Flint
The old Elite 1600's would hit about 140,000 on reads and 80,000 on writes with one or 100 drives in raid-0 when testing with ATTO. Thats just as fast as you could get stuff out of the cache. Flintstone just switched to the LSI 320-2x like I run and it hits like 250,000 on writes and 300,000 on reads using atto because the onboard cpu and cache are so much faster. ATTO never kisses the disks at all. We have 512mb of pc3200 DDR and a much faster onboard cpu.
Your Perc4 should be in between the old Elite and our 320-2x. They have made several Perc4 versions and the later ones are much faster. I would bet that with 256mb cache its a 320-2. And its still using the older sdram cache and not DDR like ours. You may have a newer model but I bet thats what it is.
I bet your ATTO's hit like 200,000 on reads and 150,000 on writes using ATTO due to the slower cpu/memory. Its faster then the older perc3's but not as fast as the new 320-2x's or Perc4e's.
Tex
Tex
I'm surprised its as fast as it is. It should of been slightly faster then a 320-2x based on the memory subsystem and cpu. Do you have open pci-e slots in that Dell server? If I can't find a board to run this thing in I would like to at least test it to know it works before I sell it.
Tex
The ram is 256mb of pc2700 cas2. As I said... I have one sitting on my desk two feet from my face.
Tex