Intermittent bizarre behavior
My dual processor Tyan Tiger MP + 2x MP2000+ is acting up again. It's doing the number where it'll be on for a while and just reboot by itself, without shutting down first. Changing the power supply rendered the machine unbootable (powered up with blank stare) until I reconnected the original, followed by a couple hours of moving memory modules around, a CMOS reset, and trying to get it to boot sans cards. I got a 128MB memory module at Frys out of desperation, and got the machine to halfways POST (before it just gave a blank screen), follwed by replacing one of the original modules and now it boots. No word yet on whether it's still doing the random power cycling. The machine requires ECC RAM to work properly, and the one we purchased wasn't, explaining the halfway-POST.
Any ideas?
-drasnor
Any ideas?
-drasnor
0
Comments
I think I found the problem. As long as no RAM is in bank 0 on the board, it seems to be stable. Hmph, I hope the board isn't out of warranty.
It isn't heat (I thought so too at first): I replaced the AMD stock HSF's with CoolerMaster HHC-001 solid copper dual heatpipe coolers a month ago. CPU temps average at 40C for CPU0 and 50C for CPU1 (F@H for a few hours, normally runs 24/7). The machine lives in an Antec mid-tower with 2 intake and 3 exhaust fans. The front air filter is clean.
-drasnor
Something in the back of my memory tells me that all four dimms (or all three dimms) have to have registered ecc ram.
Heat issue?
hmmm....I doubt it.
1) Try the doc mem or memtest route as suggested.
2) Could be a bad IDE cable
3) Check the error log in adminstration (Control Panel) and see if there are any consistent errors there.
4) I have known certain ram in certain dimms to cause this. A KT600 board I have does this with RAM in DIMM 1 and 2 but not in DIMM 2 & 3.
Hope this helps.
IDE cables are fine, I tested them recently.
I checked Tyan's website, and they have a three year warranty for boards you get from them, but the board was bought from Newegg who only offers a one year warranty, and I think it may be past that. *sigh* Next time I see it I'll check for mechanical damage to the DIMM slot. That actually has a fair probability of being the problem, since the HSF clip on CPU1 partially obstructs bank 0, but that wouldn't account for this behavior when I was using the AMD stock sinks which didn't. Note that the system wouldn't boot with *ANY* of the modules I had in bank 0.
-drasnor
Other than that, some APPARENT bad DIMM sockets truned out to be sockets stored in moisture or run in humid environment needing contact cleaner applied to socket finger contaccts and then a few ccontacts pulled back out.
Lesson, when there, take a flashlight and inspect for discolored fingers, LARGE dust bunnies, Persian cat fur (I had an AGP card in a box go APE when Persian cat fur got blown into socket when someone "cleaned" the board with an aircan). Blow socket out just in case, and look for finger contacts obviously NOT in line tiwh the others, adn use a contact cleaner pen on anything not nice and bright that should be a finger contact in DIMM socket. Slight corrosion can throw RAM timing off badly (takes a while to overcome resistance of small kinds and amounts) in the affected module and the BIOS will reject using a DIMM if modules respond at different rates.
How big are these DIMMs??? Some motherboards have 4 sockets, upper total RAM limit of 3 GIG, too. THOSE will ignore anything over 3 GIG, but should let you have a 1 Gigger in each of any three sockets.
John.
It was run for most of the time in the ***-humid town of College Station, TX.
-drasnor
But if you have 'Root', the confusers can be overcome!