Memtest Errors - Advice Please
Trying to work out some strange problems on a system here at the office.
First it had some random error messages at bootup about the one or more registry files being damaged, recovered them from a backup. I did some HD maintenance, chkdsk /f, etc. Still had the trouble. Some googling pointed me to some reg cleaning tools, still did not fix the problem.
I decided to reinstall the OS (XP Pro.) After doing that, I had random problems with the video card (ATI Radeon 9200) after installation, it would lose the driver or fail to initialize the contol panel. I tried changing video cards, then got an NTOSkernel error, which prevented the box from booting at all.
So, I then decided to change the hard drive and the the IDE cable, thinking that maybe the problems were hard drive related, seeing as different items were corrupting that seemed a logical conclusion.
Or not....on the new HD, XP Pro failed on the install twice, in different spots.
In went the Memtest bootable CD. In 4 passes, the 512 MB of DDR400 had 4 errors, 2 error on pass 2, and the exact same 2 on pass 3. Passes 1 and 5 were clean.
So, I swapped sticks, in went an identical stick of 512 MB DDR400. 4 Passes of Memtest...and on pass 3, I had the exact same 2 errors as above, in the exact same addresses.
This is where I need some input: if the exact same addresses show problems on 2 different sticks, am I seeing a motherboard or CPU based problem here? Is something in the L1/L2 cache, or the CPU causing the same address to have a random error? Or do you think that since the new stick that went in is identical & was bought at the same time so likely to be from the same manufacturing batch, so perhaps maybe I got 2 sticks of RAM from a bad batch?
I am re-installing XP now on the new HD, new RAM, and original video card, I'll see what happens. Any good input appreciated.
Dexter...
First it had some random error messages at bootup about the one or more registry files being damaged, recovered them from a backup. I did some HD maintenance, chkdsk /f, etc. Still had the trouble. Some googling pointed me to some reg cleaning tools, still did not fix the problem.
I decided to reinstall the OS (XP Pro.) After doing that, I had random problems with the video card (ATI Radeon 9200) after installation, it would lose the driver or fail to initialize the contol panel. I tried changing video cards, then got an NTOSkernel error, which prevented the box from booting at all.
So, I then decided to change the hard drive and the the IDE cable, thinking that maybe the problems were hard drive related, seeing as different items were corrupting that seemed a logical conclusion.
Or not....on the new HD, XP Pro failed on the install twice, in different spots.
In went the Memtest bootable CD. In 4 passes, the 512 MB of DDR400 had 4 errors, 2 error on pass 2, and the exact same 2 on pass 3. Passes 1 and 5 were clean.
So, I swapped sticks, in went an identical stick of 512 MB DDR400. 4 Passes of Memtest...and on pass 3, I had the exact same 2 errors as above, in the exact same addresses.
This is where I need some input: if the exact same addresses show problems on 2 different sticks, am I seeing a motherboard or CPU based problem here? Is something in the L1/L2 cache, or the CPU causing the same address to have a random error? Or do you think that since the new stick that went in is identical & was bought at the same time so likely to be from the same manufacturing batch, so perhaps maybe I got 2 sticks of RAM from a bad batch?
I am re-installing XP now on the new HD, new RAM, and original video card, I'll see what happens. Any good input appreciated.
Dexter...
0
Comments
Also, think about this, look in BIOS, see if RAM settings got lost, bad timings can yield pattern errors in Memtest, but not likely just ONE of them. If BIOS has wrong RAM settings, check BIOS time, and R&R CMOS battery and clear and then reprogram CMOS as needed. Maybe if it timing and nothing in socket, or BOTH problems exist, then that combo of cleaning socket contacts and making sure BIOS has right CPU and RAM timings might solve your issue. Happened to me on SAME board that had the cat hair piece in the RAM socket in exactly the WRONG place to allow the one finger\straight spring pin to contact the DIMM, thus one part of one module had errors in MemTest, but as with you changing the DIMM did not fix until socket was cleaned AND the CMOS cell was R&R'd.
As for XP, check the CD for tiny scratches and fingerprints &etc also, just in case, ok??? Maybe also a laser head cleaning CD if needed, or a CD drive swapout long enough to get XP loaded, THEN a CD laser head cleaning CD in the CD drive that is messing things up due to dirty laser lense???
New install had the same Reg errors as above, I am repeating with a different install disk (forgot to mention I had tried that as well.) New CD has only ever been used once, absolutely no scratches on it. And now that intallation pass has failed during the XP install with an undefined error....
I checked and air-canned the DIMM socket, looks good to the naked eye, no magnifying glass here, I can dig one up at home somewhere. All the BIOS timing and RAM settings are by default (motherboard is an Abit VI7, by the way.)
I am switching to DIMM socket 2 now, just to see if there is any difference. But I'm starting to think I will be picking up a new motherboard tomorrow morning....this was supposed to be out the door to the customer today
I have done 5 other systems in the past 2 weeks with identical hardware specs, and not had a problem with any of them. Very strange indeed.....
I'll update the thread when I know more.
Dexter...
Just for a test, I installed the RAM (the new stick) into DIMM 2 instead of DIMM 1, and so far I have not had any problems. Maybe slot 1 was flaky somehow? I had to leave the system overnight, and have not installed any hardware drivers yet, so I'll do that and see if it remains stable. I'll update the thread later.
Thanks guys.
Dexter...
We told MS, and they are looking into it.
I mean no offense when I say this, Dexter, but I figured you would be one of the last to jump to such a conclusion. Just because five previous systems exhibited similar characteristics doesn't mean the sixth computer is guaranteed to be free from flaws in design and manufacturing. It's very possible that the DIMM channel is flaky, and I think your testing even proved that.
For instance, on your above mentioned k7s5a, if you built another k7s5a, and you put Crucial memory in it, would you expect it to work flawlessly? Or not, given the experience you had? So conversely, given that 5 of 6 systems worked perfectly with the mb / ram combo I have, I am disinclined to think it is an incompatibility problem.
Today I installed everything, updated everything, and let it run, rebooted several times....no problems at all. And as changing the DIMM channel seems to have cured the trouble, I think that assumption is correct...I think that the combo of components is fine, but the DIMM channel has a fault. If the system exhibits problems again, then I'll be proven wrong....
Thanks all for your input.
Dexter...