Intermittent bizarre behavior

drasnordrasnor Starship OperatorHawthorne, CA Icrontian
edited September 2003 in Hardware
My dual processor Tyan Tiger MP + 2x MP2000+ is acting up again. It's doing the number where it'll be on for a while and just reboot by itself, without shutting down first. Changing the power supply rendered the machine unbootable (powered up with blank stare) until I reconnected the original, followed by a couple hours of moving memory modules around, a CMOS reset, and trying to get it to boot sans cards. I got a 128MB memory module at Frys out of desperation, and got the machine to halfways POST (before it just gave a blank screen), follwed by replacing one of the original modules and now it boots. No word yet on whether it's still doing the random power cycling. The machine requires ECC RAM to work properly, and the one we purchased wasn't, explaining the halfway-POST.

Any ideas?

-drasnor :fold:

Comments

  • ThraxThrax 🐌 Austin, TX Icrontian
    edited September 2003
    Does it power down and turn back on, or simply reboot? If it's simply rebooting, sounds like a memory problem. This analysis is reinforced by the new memory seemingly repairing the issue. Run Memtest or DocMem to test your memory modules for errors.
  • CreepCreep Hell Icrontian
    edited September 2003
    If it's just rebooting it could also be a heat issue.
  • drasnordrasnor Starship Operator Hawthorne, CA Icrontian
    edited September 2003
    What happens is you'll be doing something, and then the screen locks up and the hard drive indicator light comes on solid. 30 seconds later the hard disk light goes off, the 3 LED's on the keyboard flash, screen goes black and comes up with a POST display. It beeps once at the end of POST. Windows XP declares that the system has recovered from a serious error, and desires to tell Microsoft.

    I think I found the problem. As long as no RAM is in bank 0 on the board, it seems to be stable. Hmph, I hope the board isn't out of warranty.

    It isn't heat (I thought so too at first): I replaced the AMD stock HSF's with CoolerMaster HHC-001 solid copper dual heatpipe coolers a month ago. CPU temps average at 40C for CPU0 and 50C for CPU1 (F@H for a few hours, normally runs 24/7). The machine lives in an Antec mid-tower with 2 intake and 3 exhaust fans. The front air filter is clean.

    -drasnor :fold:
  • MediaManMediaMan Powered by loose parts.
    edited September 2003
    Don't the SMP boards require REGISTERED ECC RAM? Only one board to my knowledge will work with regular ram and that is the Gigabyte board which I have. But it appears that the board you have was functioning fine for a while (over a month?) and now has developed problems. What have you installed lately?

    Something in the back of my memory tells me that all four dimms (or all three dimms) have to have registered ecc ram.

    Heat issue?

    hmmm....I doubt it.

    1) Try the doc mem or memtest route as suggested.
    2) Could be a bad IDE cable
    3) Check the error log in adminstration (Control Panel) and see if there are any consistent errors there.
    4) I have known certain ram in certain dimms to cause this. A KT600 board I have does this with RAM in DIMM 1 and 2 but not in DIMM 2 & 3.


    Hope this helps.
  • drasnordrasnor Starship Operator Hawthorne, CA Icrontian
    edited September 2003
    All four DIMMS are registered ECC PC2100 DDR, with ECC enabled in the BIOS. It's been running stable from November until about June when it started doing this. I'm not hosting it (I live over 100 miles from where it is now), so it's been exciting driving over to fix it. Anyway, the hoster (my technologically-inept sister) says its been stable for the past 48 hours running FAH 24/7 without a DIMM in bank 0, so I feel certain now that bank0 has failed on the board.

    IDE cables are fine, I tested them recently.

    I checked Tyan's website, and they have a three year warranty for boards you get from them, but the board was bought from Newegg who only offers a one year warranty, and I think it may be past that. *sigh* Next time I see it I'll check for mechanical damage to the DIMM slot. That actually has a fair probability of being the problem, since the HSF clip on CPU1 partially obstructs bank 0, but that wouldn't account for this behavior when I was using the AMD stock sinks which didn't. Note that the system wouldn't boot with *ANY* of the modules I had in bank 0.

    -drasnor :fold:
  • MediaManMediaMan Powered by loose parts.
    edited September 2003
    Sounds like that is the problem. Odd that a dimm slot should fail but confusers are like that...they can confuse you from time to time.
  • Straight_ManStraight_Man Geeky, in my own way Naples, FL Icrontian
    edited September 2003
    I have had two boards have a "bad" DIMM slot 1 (typically bank 0 in a 4 socket board of that kind). One, was a piece of CARDBOARD flake down in socket-- blew it out with an arican, DIMM then had contact to pins blocked by cardbaord, worked fine after that. Other one was a WARPED DIMM socket, not heat warped, unevenly stuck in to begin with and wave soldered in WARPED.

    Other than that, some APPARENT bad DIMM sockets truned out to be sockets stored in moisture or run in humid environment needing contact cleaner applied to socket finger contaccts and then a few ccontacts pulled back out.

    Lesson, when there, take a flashlight and inspect for discolored fingers, LARGE dust bunnies, Persian cat fur (I had an AGP card in a box go APE when Persian cat fur got blown into socket when someone "cleaned" the board with an aircan). Blow socket out just in case, and look for finger contacts obviously NOT in line tiwh the others, adn use a contact cleaner pen on anything not nice and bright that should be a finger contact in DIMM socket. Slight corrosion can throw RAM timing off badly (takes a while to overcome resistance of small kinds and amounts) in the affected module and the BIOS will reject using a DIMM if modules respond at different rates.

    How big are these DIMMs??? Some motherboards have 4 sockets, upper total RAM limit of 3 GIG, too. THOSE will ignore anything over 3 GIG, but should let you have a 1 Gigger in each of any three sockets.

    John.
  • drasnordrasnor Starship Operator Hawthorne, CA Icrontian
    edited September 2003
    I've got four 256MB DIMMs. You may be right about there being crap in the socket, since the last computer she had looked like the inside of a vacuum cleaner bag after I got it back, and she's got two cats and a tendency not to vacuum the house. I cleaned out a few dust bunnies the size of my fingernail the last time I was in the dualie, so it wouldn't surprise me in the least.

    It was run for most of the time in the ***-humid town of College Station, TX.

    -drasnor :fold:
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    Odd that a dimm slot should fail but confusers are like that...they can confuse you from time to time.

    But if you have 'Root', the confusers can be overcome! :p
Sign In or Register to comment.