Random freezes

edited June 2003 in Hardware
I'm experiencing random lockups with a computer I've been working on for a while. Currently the components are:

Abit KT7 Raid, with the OS drive on the onboard IDE connectors in a mirrored array.
Win2K Server
1 Aha 2940UW SCSI controller
1 Aha 2190? SCSI controller
2 Maxtor 40GB drives
512MB Ram (not sure of the brand)
2 intel pro server+ nics
1 new highpoint rockraid 133 ide controller
2 new 200GB WD drives

The powersupply seems fine and the motherboard seems fine. The computer locks up during the middle of the night during backups and the HDD light stays on. I've run stress tests on the memory and it seems fine. The nics seem fine as well. I can't find any diagnostic apps for the SCSI cards online anywhere, but I suspect one of them might be problematic (I haven't had a chance to pull them one at a time yet).

Any thoughts or ideas? Thanks.

Comments

  • EMNEMN
    edited June 2003
    Well if it backups specifically during the backups, which are hdd intensive, i'd suspect an issue there.

    Another thing to consider is running disk benchmarks on the drives (like sandra or atto) and see if that locks it up. Then you'd know for sure.

    Also, scsi drives get very very hot, especially during intensive use (such as a backup) so I'd check the temps (usually done by touching the drive :P and if you can't then you need some cooling ;) ). A simple 80mm case fan across the from the drive works wonders, keeps my cheetah (15k rpm scsi) nice and cool.
  • BlackHawkBlackHawk Bible music connoisseur There's no place like 127.0.0.1 Icrontian
    edited June 2003
    Maybe the PSU is crappying when it's full load...
  • edited June 2003
    Actually, the tape drives have been removed from action and the problem still happens, even if nothing is attached to the SCSI cards (there are no SCSI hard drives). I'm pretty sure the powersupply is fine, and I've checked the temps and voltages and they are acceptable. I'll try benchmarking the drives the OS is on since that's really the only component that hasn't changed throughout the entire process. Thanks for the suggestions.
  • edited June 2003
    You say you've check the temps. What kind of temps are you getting under load??? While we are at it, which BIOS version are you running??
  • edited June 2003
    Temps are 40-45 C idle, 50-55 C load, which are on the high end, but acceptable to me.

    I'm not certain, but I believe the bios is version 3R (with accompanying HPT bios update), dated 7/5/2001.

    The computer itself was running fine for 2 years, this problem has just popped up recently, no HW changes, no power surges/sags.

    Is it possible that the onboard raid controller has gone bad?
  • RobRob Detroit, MI
    edited June 2003
    I tracked a bad PSU for months one time. Could fine NO reasons for the lockups, yet randomly it would stop. Finally, one night the PSU literally blew up. Shot sparks, loud pop, the works. Replaced it, and the machine was stable.

    Durring high disk loads, I would say that would put a large sustained power draw. My first look would be PSU, then drivers/hardware conflict in the disk subsystem. Probably because of my never ending adventure tracking down a bad PSU.
  • edited June 2003
    I'm fairly positive it isn't the powersupply (550W I think) since it was running 2 tape drives and 6 hard drives for over two years without choking onc, so it's got more than enough power to run just 2 tape drives and 2 hard drives. I'm going to pull the SCSI cards when I get the chance.
  • RobRob Detroit, MI
    edited June 2003
    Really, it could be any of them, or even something simple were overlooking. I'm just saying don't rule it out, it is a possibility.

    Mine ran for over a year on that PSU before it died. I said the same things as you are, it cant be it cant be, it checks fine, well......
  • EnverexEnverex Worcester, UK Icrontian
    edited June 2003
    Maybe its the software you are using as opposed to the hardware causing problems if it only does it when you are making backups....

    NS
  • Omega65Omega65 Philadelphia, Pa
    edited June 2003
    Originally posted by Stranger
    I'm experiencing random lockups with a computer I've been working on for a while.
    Any thoughts or ideas? Thanks.

    Try this - Take the heatsink off the CPU, remove the fan and check for Dust buildup in the fins. You'll be surprised (shocked) how much is there.
  • FlintstoneFlintstone SE Florida
    edited June 2003
    Wasn't this one of the boards with the "bad caps" syndrome? Have you checked all of the capacitors for leaks?

    Flint
  • edited June 2003
    Originally posted by Stranger
    Temps are 40-45 C idle, 50-55 C load, which are on the high end, but acceptable to me.

    I'm not certain, but I believe the bios is version 3R (with accompanying HPT bios update), dated 7/5/2001.

    The computer itself was running fine for 2 years, this problem has just popped up recently, no HW changes, no power surges/sags.

    Is it possible that the onboard raid controller has gone bad?
    I doubt that the RAID controller has anything to do with it. That BIOS is a stable version. You got the FSB cranked up on this machine at all??

    The high temps get in to the realm of unstability. They may be acceptable to you, but the CPU might not like them. Components tolerance change over time. Here's a cheap trick. Knock the side off you machine and see if the temps drop. If so, run it with the side off for awhile and see if it still locks up.

    Let us know what happens.
  • edited June 2003
    Nah, no overclocking or anything. The case cover has been off for a while now, so that's not part of the problem. It hasn't crashed for 2 days since I removed one of the SCSI cards, but now I'm doing backups at half speed and I'm not satisfied that the card was the problem, so I'll keep looking. I'll probably tear it down and rebuild it.

    Flint - I looked over the board pretty thoroughly and didn't find any physical problems.
  • CCWCCW Suffolk, UK
    edited June 2003
    Tear it down rebuild it, add one component at a time, then take it out, add another then do combinations of components to make sure they are all compatible. Time consuming. You might want to get a few (about 30) cfm of air going over the drves if they are going to be used intensively.

    Craig
  • edited June 2003
    I've finished rebuilding it and everything seems fine but now I have a new problem which I haven't encountered before. I'm trying to format the 200G WD drives (which have already been set up in a mirrored array via the rocketraid 133 card) but the formatting seems to slow down and eventually stop at 70% (and this is after 4 or 5 hours) and it never finishes. The card is compatible with 137+GB drives so I'm not sure what's going on.
Sign In or Register to comment.