PDA

View Full Version : Raid5 Problem


JanKir
8 Mar 2005, 5:22pm
Hi,
I had a Raid 5 setup with 4 200GB Maxtor drives. Last week one of them broke and im still waiting for a replacement. Yesterday the computer got really slow and i ran chkdisk. Because it took so long i went to bed and when i looked at it this morning it was stuck at "checking free space" at about 35%. I hit the reset switch and the Promise S-150 SX4 controller reported the array as off-line. Now there is an array defined but it reports two missing drives and one as free. I dont think there was any data lost on the drives, but i need the controller to recognize the 3rd drive. What can i do? There was a lot of data on it and it really would suck losing it all.
Thx for your help!

Jan

Tex
8 Mar 2005, 5:33pm
I wouldnt touch a thing till you get a replacement. Unless you were running raid-5 with three drives and a hot spare as the fourth (which would explian the slowness as it was rebuilding the array onto the hotspare which would be very slow and take a long time... then it needs a new drive to function properly and to rebuild the redundancy etc..

rebooting while its running in that mode was very bad. I personaly would never run raid-5 on a promise controller as they are not known for their robust raid-5 solutions.

go email promise tech support RIGHT NOW and ask how to proceed. There are very few of those controllers in teh field and probably fewer running raid-5 and if you want GOOD advice and not guesses considering that the data is important go to Promise for the answers on how to proceed.

Raid-5 systems I have messed with (I am not a fan btw...) require the failed drive to be replaced to retain the array. If you were running a hot spare and those were big full ide drives it could take 24 hours or more to rebuild the raid-5 when you replace the drive. Until its replaced its running in a waaaaayyyyy reduced performance mode. State of the art scsi is a differant picture. The drives are smaller and rebuild faster. As far as I am concerned you should not use raid-5 without an extra hotswap spare with scsi/ide or sata. Its just wrong on many levels.

Tex

JanKir
8 Mar 2005, 5:38pm
The problem is that i was running it with 3 drives in a critical status, because the replacement isnt here yet. There was no rebuilding going on at that time.
I will try to contact Promise then.

JanKir
14 Apr 2005, 7:58pm
Ok, I got the new drive today, put it in and of course everything is gone from the array. So i reformated installed XP Pro. While restarting the first time the array went into critical mode again! I am getting really pissed off with this controller now, but what can I do? Get one from another company?
Tex you can probably give me the best advice what to do now. Basically i have these 4 200GB Maxtor SATA drives and a ****ty Promise controller, since you´re saying raid 5 is not the best choice, what is? And what controller should i choose?

Thank you a lot!
Jan

Gargoyle
14 Apr 2005, 8:14pm
Wait to hear from Tex, but I'd would get a controller from another manufacturer. I've had bad experiences with Promise, and I know I'm not the only one.

While you're at it, if you have the cash (these aren't cheap), this would be a good time to get a nice caching RAID controller to improve performance.

Shorty
14 Apr 2005, 8:36pm
Heed Tex's words with regard to IDE RAID-5. Just don't do it. Im surprised your PC wasn't running like a dog. RAID-5 on IDE without a dedicated XOR processor & cache memory runs drain slow :(

Im a fan of RAID5 if it's done right. The hot spare idea is essential. Waiting for another drive to arrive is bad and seriously runs the risk of the data loss you have experienced. I work in enterprise. If a RAID5 loses a disk, we have a hot spare and a cupboard full of spares!

What can I suggest/bring to the table? Ditch the IDE's, get some SATA drives and a 3ware controller card with a dedicated processor & cache memory. It offloads the parity calculations from the CPU and onto the card. It also brings "hot swap" capability. Just make sure you have a local dealer with available kit incase one spins off it's coil.. or a spare in a drawer!

Incidently, the cache memory also makes a monster difference. If your budget can dictate it, get SCSI! RAID-5 will still crush it hard but SCSI can take the strain :) I wish I could afford it!

Im sure soothsayers will read this and say "oh that's rubbish, I run RAID-5 on 4 laptop drives blah blah blah". Maybe you do but when the thread starters situation hits you, it's not pretty.

JanKir
14 Apr 2005, 9:50pm
Thanks for the input shorty, but i already have 4 SATA drives.
I looked at the 3ware Escalade 8506-4LP controller but its like 300 Euros and I´m not really sure if i want to spend that kind of money yet...
I set up a Raid 0+1 now because I really need the redundancy, and I´ll check my budget for the 3ware :)

Tex
15 Apr 2005, 1:09am
Thanks for the input shorty, but i already have 4 SATA drives.
I looked at the 3ware Escalade 8506-4LP controller but its like 300 Euros and I´m not really sure if i want to spend that kind of money yet...
I set up a Raid 0+1 now because I really need the redundancy, and I´ll check my budget for the 3ware :)

Raid 0+1 or even a raid-0 you backed up to non raided drives would be my options.

Remember even redundant raid like raid-1 or raid 0+1 or raid-5 does not mean you can escape from backing up. (as you just learned) It only protects from a complete drive failure. A controller failure... Or a corrupted FS... or you delete crap by mistake or... Well geeez the list goes on for ever...

None of those are protected with ANY level of raid.

You need to develop a BACKUP routine. Run raid-0 if you want but BACK IT UP to a non raided drive or even another raid-0 array on a differant raid controller is the safest bet.

My servers in the house here back themselves up to other servers across my gigabit lan nightly so my data always exists on at LEAST two seperate computers nightly and crap like our outlook mail folders back themselves up several times a day to multiple machines as we would both sheet if we lost emails.

Run a pair in raid-0 for the performance and BACK IT UP often to non raided drives is my recomendation.

I am not not just taking out my butt. I do this for a living and this is what I do to protect both my own data and my customers also.

Tex

Shorty
15 Apr 2005, 6:11am
^^^^^^^^^^^^ Speaks the truth.

Il echo that every word. Backup, backup and backup again.

Get Acronis True Image (http://www.acronis.com/). It supports a fullbackup & incremental (point in time) appended backups. I use that to backup my server network at home. Each machine runs it and backups over the network to another. I know it's a pain when you have masses of data to have to shift around but when you do a big buy of kit, prepare it in :)

The 3ware controllers are expensive but well worth it. They take the load off the CPU and software RAID (especially RAID-5) is like jelly. That's why I don't use it at home (yet anyway). Investment is high when moving to a serious storage solution but once it's in place, you can relax a little :)

JanKir
15 Apr 2005, 11:27am
First of all I think my Problem is one of the SATA Cables... The 0+1 Array that i set up yesterday went into critical again, after a couple of reboots and its always the same channel that just wont see the drive. Its not the channel or the drive, so i will buy a new set of cables tomorrow. I have a spare 80GB Seagate drive so i will use that as a backup, but i think a Raid 0 with 4 drives will be to risky, and a raid 0+1 will lose to much storage (50%), so will the performance be good on a 3ware and raid5 ? If it is i might as well get the 3ware and leave the storage like that for a couple of years (maybe change the drives sometime for larger ones) ;)

Tex
15 Apr 2005, 3:29pm
When you get your raid-5 back up again post some benchmarks for us.

Tex

JanKir
15 Apr 2005, 8:33pm
with the promise controller ?
I can remember that the writes were pretty low, not higher than 40,000 in Atto no matter what latency I tried and the reads were about 80,000 max. In general not very fast, but acceptable to me if i get the safety of raid 5. Can you estimate the performance of the 3ware controller in a Raid5 with the 7200 rpm 200GB maxtor drives?

Tex
16 Apr 2005, 11:45pm
I think 40,000 for writes was awesome with that setup.

JanKir
17 Apr 2005, 6:28pm
Ok I set it up to a Raid 5 array again, and here is an Atto Screen.

Tex
18 Apr 2005, 5:15pm
fine atto for raid-5

JanKir
19 Apr 2005, 10:01pm
Wow, Tex said my Atto is good !
I guess my array works now, has been stable for the last couple of days, so I will keep this controller for now. I will get a PCI-Express one when I will do a complete Rebuild once I saved up some money and my current system won't run everything anymore.

Gobbles
19 Apr 2005, 11:27pm
We use 3ware and we use Acronis.

3ware pwns, I have the same controller your talking about. I also use a lot, and I mean a CASE of 3ware 8006-2lp controllers a week.

Acronis pwns. Ive restored partitions that were 16gig to 8 gig partitions and vice versa, acronis does not even flinch while doing it. We use the enterprise version.

JanKir
19 Apr 2005, 11:52pm
Hi Gobbles!
So you are using the 3ware in a Raid 5 config with 4 Drives?
Can you please post up an ATTO screen for me?

Thanks a lot.

Tex
20 Apr 2005, 12:31am
It really doesnt matter what his atto is. The 3ware is a whole differant level of raid controller. it's one of the few IDE raid controllers that are used in corporate america.

They are much more robust and dependable and thats really what raid-5 is all about.

I wouldn't touch raid-5 on a promise controller for all the tea in china. I would only even consider it on a 3ware for IDE raid-5.

Consider the Promise as low end toys for the enthusiast and the 3ware is a stable mature robust answer designed for mission critical applications used by the buisness world.

How can I put this?

You own the equivalent of a hopped up Kia. Gobbles uses a Mercedes.

Tex

Flintstone
20 Apr 2005, 3:43am
Then, when we start talking serious, there are these:

http://www.lsilogic.com/products/megaraid/index.html

A little more money, but they're the real thing!!

Flint

Kwitko
20 Apr 2005, 4:05am
After talking with Tex, I've become a big fan of RAID 10, especially for a DB server. I'm going to bench our new server with Atto. Ultra320 SCSI RAID 10 should put up some nice numbers.

As for backups, we're using a Dell LTO2 tape unit (200GB native) with Veritas Backup Exec. I was thinking of getting an autoloader, but I need to take the previous day's backup offsite.

Flintstone
20 Apr 2005, 1:55pm
Kwitko,
Depending on the controller and the amount of ram on the controller, as Tex reminded me, Atto's I/O test won't even touch the disks. It sits in the onboard ram on my controller and there really is no disk activity in the bench at all. When I had a slower controller with less ram on a slower bus, the same thing occurred. My new controller is on a PCI-X bus and I still can't bury the controller in I/O's. YET!

Flint

Kwitko
20 Apr 2005, 2:01pm
It's a Dell PERC controller with 256MB RAM. Hopefully that will contribute to good I/O scores.

Tex
20 Apr 2005, 2:16pm
As flintstone mentioned. With 256mb cache all your testing really is the cpu/memory subsystem on the controller.

The old Elite 1600's would hit about 140,000 on reads and 80,000 on writes with one or 100 drives in raid-0 when testing with ATTO. Thats just as fast as you could get stuff out of the cache. Flintstone just switched to the LSI 320-2x like I run and it hits like 250,000 on writes and 300,000 on reads using atto because the onboard cpu and cache are so much faster. ATTO never kisses the disks at all. We have 512mb of pc3200 DDR and a much faster onboard cpu.

Your Perc4 should be in between the old Elite and our 320-2x. They have made several Perc4 versions and the later ones are much faster. I would bet that with 256mb cache its a 320-2. And its still using the older sdram cache and not DDR like ours. You may have a newer model but I bet thats what it is.

I bet your ATTO's hit like 200,000 on reads and 150,000 on writes using ATTO due to the slower cpu/memory. Its faster then the older perc3's but not as fast as the new 320-2x's or Perc4e's.

Tex

Kwitko
20 Apr 2005, 8:31pm
How should I set up the benchmark test?

Tex
20 Apr 2005, 8:38pm
If your using ATTO change only the total length to 32mb

Tex

Kwitko
20 Apr 2005, 9:19pm
Benchies! Tex, the controller is a PERC 4e.

Tex
20 Apr 2005, 11:55pm
Whats it in? I have a Perc4e sitting on the desk in front of me. Just don't have anything I can run it in. MY dfi with pci-e slots won't boot with it in. So many non servers right now are really only geared to support video cards in their pci-e slots.

I'm surprised its as fast as it is. It should of been slightly faster then a 320-2x based on the memory subsystem and cpu. Do you have open pci-e slots in that Dell server? If I can't find a board to run this thing in I would like to at least test it to know it works before I sell it.

Tex

Kwitko
21 Apr 2005, 1:08am
I don't know if it has PCI-e, I'll have to check tomorrow. I'll also check the controller's RAM.

Tex
21 Apr 2005, 12:19pm
The perc4e is pci-e. So its got at least one. (grin) Thats why its faster then mine.

The ram is 256mb of pc2700 cas2. As I said... I have one sitting on my desk two feet from my face.

Tex