Intermittent system failure (Tyan Tiger MP)
Last Christmas I built a dual processor computer for my sister and thoroughly checked it out and ran burn-in tests before it left the door. Last month, it developed a problem in which the machine randomly and abruptly reboots itself for no apparent reason. It doesn't go through the Windows XP shut down sequence, just shoots straight to a black screen followed by the POST display. When Windows boots next time, after she logs in it displays the "Windows has recovered from a serious error." message. It does not ask to run chdisk during boot up. I'm working on getting the contents of the "Details..." tab from her, so that's coming.
The machine in question is a Tyan Tiger MP with dual Athlon MP2000+'s. It has 1024MB of RAM in 4 ECC PC2100 DIMMs. ECC on the RAM is set to ECC Scrub. The machine is housed in an Antec full tower with all the fan brackets populated and a 480W TruePower PSU. A couple of weeks ago, I took the machine in for an overhaul because it had begun to display this problem. I replaced the stock AMD fans with CoolerMaster HHC-001's, reseated the heat spreaders on her RAM, reseated the RAM, formatted the hard drive, reloaded Windows XP SP1, and installed the newest drivers for her Audigy, Linksys LAN card, and ATi FireGL 8800. Aside from the stock Windows software, it also has two instances of Folding@Home. I cannot duplicate her problem at my house, though I've seen it in action at hers.
I had her put it on a new surge suppressor, but that didn't work either. I have no idea what's wrong with it, though I feel like it's power-related. If anyone has a clue what's wrong with it, please share.
-drasnor
The machine in question is a Tyan Tiger MP with dual Athlon MP2000+'s. It has 1024MB of RAM in 4 ECC PC2100 DIMMs. ECC on the RAM is set to ECC Scrub. The machine is housed in an Antec full tower with all the fan brackets populated and a 480W TruePower PSU. A couple of weeks ago, I took the machine in for an overhaul because it had begun to display this problem. I replaced the stock AMD fans with CoolerMaster HHC-001's, reseated the heat spreaders on her RAM, reseated the RAM, formatted the hard drive, reloaded Windows XP SP1, and installed the newest drivers for her Audigy, Linksys LAN card, and ATi FireGL 8800. Aside from the stock Windows software, it also has two instances of Folding@Home. I cannot duplicate her problem at my house, though I've seen it in action at hers.
I had her put it on a new surge suppressor, but that didn't work either. I have no idea what's wrong with it, though I feel like it's power-related. If anyone has a clue what's wrong with it, please share.
-drasnor
0
Comments
What does your sis need with a dualie?
I tried running it in several different memory configurations at home and at her house. Everything worked fine at my house, but the system still crashed regardless of the number and placement of DIMMs at her house.
Absolutely nothing. It's a 24/7 folding rig and it's loud, so she gets to house it and chat it up.
-drasnor
Does she really need a gig of ram to chat it up lol?
I hope you get it working so those dualies will turn in some WU's
Now reboot if the machine is set to boot off the floppy first. (If not change that in BIOS).
The PC will boot off the floppy and she'll have to run the diagnostics. There are two choices via keyboard if memory serves me correct; basic and advanced tests. Have her run the advanced test which will take some time.
See if it reports back bad memory errors.
If it does then you have your problem.
You may also just try setting BIOS to check errors instead of scrub. Scrubbing is really only useful in server environments.
Hope this helps.
If possible, see if she can temporarily move the computer to a different location (on a different circuit). If you have access to a line conditioner you could try that. See if you can determine what else is on that circuit, and move as many things as possible (moving all would be nice) to a different circuit.
You might also try one of those cheap line testers (any hardware store should have them - see picture) and see if she has an open or floating ground. Have her pay special attention to what else is happening at the moment it craps out, like an appliance coming on, etc. Also, check the voltage itself, I've seen 115VAC as low as 90V during summer months when power companies often practice what is known as a "rolling brownout". They drop the voltage in different areas at different times to reduce the overall load on the entire system. When I was a building engineer in Washington, DC we had to put phase protectors on all of our three-phase motors because of such trickery. The fact that your problem just started last month, when it got really hot in most areas, leads me to believe it might just be an abnormally heavy load on your power company, causing them to fiddle around with the current.
We live in Texas, so I'm inclined to believe you're right about the AC draw, plus her apartment complex is pretty decrepit and they probably skimped on installing wiring. As far as temps go, my house is cooler than hers by about 10 F (72F vs. 82F), but the case/CPU temps are comparable after I installed those solid copper dual heatpipe HHC-001's on her CPU's.
Thank you much for the info. Also, do you think getting a UPS would help if it really is lousy power?
-drasnor
Good Luck!:)