PDA

View Full Version : Random freezes


Stranger
9 Jun 2003, 10:28pm
I'm experiencing random lockups with a computer I've been working on for a while. Currently the components are:

Abit KT7 Raid, with the OS drive on the onboard IDE connectors in a mirrored array.
Win2K Server
1 Aha 2940UW SCSI controller
1 Aha 2190? SCSI controller
2 Maxtor 40GB drives
512MB Ram (not sure of the brand)
2 intel pro server+ nics
1 new highpoint rockraid 133 ide controller
2 new 200GB WD drives

The powersupply seems fine and the motherboard seems fine. The computer locks up during the middle of the night during backups and the HDD light stays on. I've run stress tests on the memory and it seems fine. The nics seem fine as well. I can't find any diagnostic apps for the SCSI cards online anywhere, but I suspect one of them might be problematic (I haven't had a chance to pull them one at a time yet).

Any thoughts or ideas? Thanks.

EvilMathNinja
9 Jun 2003, 10:40pm
Well if it backups specifically during the backups, which are hdd intensive, i'd suspect an issue there.

Another thing to consider is running disk benchmarks on the drives (like sandra or atto) and see if that locks it up. Then you'd know for sure.

Also, scsi drives get very very hot, especially during intensive use (such as a backup) so I'd check the temps (usually done by touching the drive :P and if you can't then you need some cooling ;) ). A simple 80mm case fan across the from the drive works wonders, keeps my cheetah (15k rpm scsi) nice and cool.

Black Hawk
9 Jun 2003, 10:52pm
Maybe the PSU is crappying when it's full load...

Stranger
10 Jun 2003, 03:42am
Actually, the tape drives have been removed from action and the problem still happens, even if nothing is attached to the SCSI cards (there are no SCSI hard drives). I'm pretty sure the powersupply is fine, and I've checked the temps and voltages and they are acceptable. I'll try benchmarking the drives the OS is on since that's really the only component that hasn't changed throughout the entire process. Thanks for the suggestions.

Cool Canuck
10 Jun 2003, 04:12am
You say you've check the temps. What kind of temps are you getting under load??? While we are at it, which BIOS version are you running??

Stranger
10 Jun 2003, 05:19am
Temps are 40-45 C idle, 50-55 C load, which are on the high end, but acceptable to me.

I'm not certain, but I believe the bios is version 3R (with accompanying HPT bios update), dated 7/5/2001.

The computer itself was running fine for 2 years, this problem has just popped up recently, no HW changes, no power surges/sags.

Is it possible that the onboard raid controller has gone bad?

Rob
10 Jun 2003, 07:35am
I tracked a bad PSU for months one time. Could fine NO reasons for the lockups, yet randomly it would stop. Finally, one night the PSU literally blew up. Shot sparks, loud pop, the works. Replaced it, and the machine was stable.

Durring high disk loads, I would say that would put a large sustained power draw. My first look would be PSU, then drivers/hardware conflict in the disk subsystem. Probably because of my never ending adventure tracking down a bad PSU.

Stranger
10 Jun 2003, 08:03am
I'm fairly positive it isn't the powersupply (550W I think) since it was running 2 tape drives and 6 hard drives for over two years without choking onc, so it's got more than enough power to run just 2 tape drives and 2 hard drives. I'm going to pull the SCSI cards when I get the chance.

Rob
10 Jun 2003, 09:10am
Really, it could be any of them, or even something simple were overlooking. I'm just saying don't rule it out, it is a possibility.

Mine ran for over a year on that PSU before it died. I said the same things as you are, it cant be it cant be, it checks fine, well......

Enverex
10 Jun 2003, 09:24am
Maybe its the software you are using as opposed to the hardware causing problems if it only does it when you are making backups....

NS

Omega65
10 Jun 2003, 12:34pm
Originally posted by Stranger
I'm experiencing random lockups with a computer I've been working on for a while.
Any thoughts or ideas? Thanks.

Try this - Take the heatsink off the CPU, remove the fan and check for Dust buildup in the fins. You'll be surprised (shocked) how much is there.

Flintstone
10 Jun 2003, 02:32pm
Wasn't this one of the boards with the "bad caps" syndrome? Have you checked all of the capacitors for leaks?

Flint

Cool Canuck
10 Jun 2003, 06:56pm
Originally posted by Stranger
Temps are 40-45 C idle, 50-55 C load, which are on the high end, but acceptable to me.

I'm not certain, but I believe the bios is version 3R (with accompanying HPT bios update), dated 7/5/2001.

The computer itself was running fine for 2 years, this problem has just popped up recently, no HW changes, no power surges/sags.

Is it possible that the onboard raid controller has gone bad?
I doubt that the RAID controller has anything to do with it. That BIOS is a stable version. You got the FSB cranked up on this machine at all??

The high temps get in to the realm of unstability. They may be acceptable to you, but the CPU might not like them. Components tolerance change over time. Here's a cheap trick. Knock the side off you machine and see if the temps drop. If so, run it with the side off for awhile and see if it still locks up.

Let us know what happens.

Stranger
11 Jun 2003, 06:02am
Nah, no overclocking or anything. The case cover has been off for a while now, so that's not part of the problem. It hasn't crashed for 2 days since I removed one of the SCSI cards, but now I'm doing backups at half speed and I'm not satisfied that the card was the problem, so I'll keep looking. I'll probably tear it down and rebuild it.

Flint - I looked over the board pretty thoroughly and didn't find any physical problems.

CCW
11 Jun 2003, 09:53am
Tear it down rebuild it, add one component at a time, then take it out, add another then do combinations of components to make sure they are all compatible. Time consuming. You might want to get a few (about 30) cfm of air going over the drves if they are going to be used intensively.

Craig

Stranger
11 Jun 2003, 09:43pm
I've finished rebuilding it and everything seems fine but now I have a new problem which I haven't encountered before. I'm trying to format the 200G WD drives (which have already been set up in a mirrored array via the rocketraid 133 card) but the formatting seems to slow down and eventually stop at 70% (and this is after 4 or 5 hours) and it never finishes. The card is compatible with 137+GB drives so I'm not sure what's going on.