PDA

View Full Version : Freezing with AMD and SSE?


mmonnin
8 Jan 2004, 4:54am
http://forum.folding-community.org/viewtopic.php?t=6682&highlight=

Hi,

We would like to gather information to send to AMD regarding problems people have had with FAH using AMD procs and SSE.

In particular, we need to get

1) Full s/n of the processor(s) that failed (both lines, please. This will allow us to identify the specific revision of the silicon)

2) BIOS manufacturer, and revision

3) Motherboard name and revision (printed on the board)

4) Operating system, which patches were installed (does it happen if Linux is run?)

Please post that info here. Please do not use this thread for a discussion about this topic, just info (feel free to open another thread). We would like to gather all the info from here without having to parse through other discussions.

Thanks!

Vijay & the FAH Team

PS Thanks to David for working to clear out this thread and keep it just with the info (discussions in other threads please).

Edit From Pythagors: As this is such an important thread it will be much more strictly moderated than others. I.e. anything but the information requested will be edited out or deleted if necessary.

Any other comments or discussion on this topic can be done here.

http://forum.folding-community.org/viewtopic.php?t=6683

edcentric
8 Jan 2004, 1:40pm
I suspect that it is more than just SSE.
One my old tbird boxes has locked twice recently with the new core and client.
I'll keep a closer eye on things now.

Straight_Man
8 Jan 2004, 2:26pm
IF they implemented full SSE2 tuning in the new Gromacs WUs, that could contribute to problems with a TBird. My Barton has no such issues, but I watch temps like a HAWK during the day and an Owl at night....

John.

Straight_Man
8 Jan 2004, 2:34pm
Also, dumb thing but in XP this can happen:

When my client encounters a full FAHlog at time of completion or close to completion, sometimes the FAHlog.txt file gets hung along with a machine lock. So, Every few days my FAHlog.txt file becomes a FAHlog-Prev09 or so, with client off, and new FAHlog.txt is created. Box can do quite few work units in 3 days.... And client seems to not be able to save, rename old log file, and open new on the fly. Since I started manually renaming log files of over 3 days in age, no issues. Is file size or number of lines, not sure which as when it happens I get no rename to FAHlog-Prev of the FAHlog. Look for large FAHlog.txt files, expect faster boxes to do this more often than slower ones as they gen bigger logs faster.

Linux seems to do this less, for whatever reason.

If you see this only on boxes with Norton SystemWorks on them, try this:

Find folder with FAH in it.

Right-Click the Norton protected Recycle Bin Icon.

Click Protection tab.

Click Exclusions button.

Enter the path for the FAH folder, stick a - sign in front, and ** after path with \ at end of folder spec, so if I had my FAH stuff in c:\..\program files\Folding @ Home I wold stick in an exclusion line of:
-c:\..\program files\Folding @ Home\**
which would exclude from saving all files in the folder and any subfolder. I was getting huge amounts of "deleted" files kept in recycle bin, and the Barton box seems not to like a Protected Recycle Bin with over 1500 files in it of same name (FAHlog.txt). so, excluded whole Folding @ Home area from Protection.... FAH is not appending to log, it is doign something that results in anew copy for every entry. Codicil, empty normal recycle bin every once in a while also, please (every 2-3 days on a fast folding boxif you do not exclude FAH folder from being saved, or all txt files). It also has limits to how many files it can list.... Overflowing will lock box in part, usually at time the file that overflows it is added. Then FAH locks, box locks for real.... Defrag drive when time after emptying a recycle bin with 2K or so of file ENTIRES in it...

John.

mmonnin
8 Jan 2004, 5:47pm
There have been WUs known to lock up on AMD SSE CPUs but not Durons or P4s that dont have SSE. Its not the WU thats bad either.

primesuspect
8 Jan 2004, 6:06pm
Hey. Maybe THAT's why my box has been frozen solid when I come in to work in the morning lately...... Curious.

mmonnin
8 Jan 2004, 6:23pm
Yes it could be prime.

This has happened to non-overclocked CPUs and underclocked CPUs. Stanford is waiting for a reply from AMD on this subject.

t1rhino
8 Jan 2004, 6:38pm
No issues here with any computers.

csimon
9 Jan 2004, 2:16am
Hey. Maybe THAT's why my box has been frozen solid when I come in to work in the morning lately...... Curious.
stanford has been recommending using 3Dnow! instead if you have freezing issues until AMD responds to the problem.

mmonnin
19 Jan 2004, 2:56pm
A small reply from a guy at AMD. They are working on this guys. Please do report it to the community if you are having problems.

AMD thanks the Folding at Home users for providing information about the freezing problem. Thanks to Prof Pande, AMD has reproduced the problem in our Austin labs. Although not an official workaround, we have observed that if the console application is launched without the -forceasm switch, or if the GUI version advanced properties setting to enable advanced optimizations is not enabled, then the freeze does not occur.

We apologize for any inconvenience and will be working with Prof. Pande as soon as a better workaround is available.

AMD_Mike

keto
19 Jan 2004, 3:12pm
Not going to disassemble the unit to get the stepping sorry. 2100+ running 11 X 200 which is otherwise totally stable 24/7 at that speed or higher (210) since last Feb. Locked up 3 times recently, no idea what core/wu it was working on at the time. Not much usefull info I know but it did rather puzzle me at the time.

Spinner
19 Jan 2004, 8:07pm
My girlfriends overclocked (TB) 1700+ running at 2.1GHz started locking up after installing the new client, after months with no problems folding with v3.5 client. I increased the CPU core voltage slightly and the problem seems to have resolved. However it's hard to tell whether or not this is an SSE issue or the new client just working the CPU a little bit harder pushing an overclocked CPU harder than it had been pushed before.

No problems on my Palomino or Barton systems currently.

qparadox
19 Jan 2004, 8:58pm
I had the same problems on a Barton 2500+. Even reducing the cpu usage to 50% didn't stop it. Unfortunately I sold the bugger off last week because it was annoying me to no end. I do know it was one of the new locked Bartons but that's it :/.

M/B was an EPOX 8RDAE (NFII Ultra) Rev 1.02. OS was XP Pro SP1 and had all patches installed. Never tried Linux on it.

It would typically lock up after completing 1-2 frames, although sometimes it would lock up almost instantly and other times it would complete 6-7 frames.

csimon
4 Feb 2004, 12:22am
Is anyone still getting these freezes or has it fixed itself?

tycho
4 Feb 2004, 12:43am
I hadn't seen this before... I have had a few lock-ups as of late, usually when i come home or wake up my computer fails to respond. My 2500+ isn't OCed and isn't overheating... I hope they find a solution for this

primesuspect
4 Feb 2004, 1:12am
It's been about a week since I've seen a hard freeze. As I said, I can always tell because in the morning I'll come in and my computer will be frozen solid. This week it's been fine.

mmonnin
4 Feb 2004, 1:26am
Hey guys I here there is a fix for this. Sign up for the beta team and you can get the beta fix for this. Not sure of all the information yet as I havent read it myself. AMD is working on it guys.

a2jfreak
4 Feb 2004, 3:35am
I looked, and perhaps I'm just missing it, but I don't find the beta sign-up area.
I have two 2.3GHz Athlons folding that freeze and it's extremely irritating.

mmonnin
4 Feb 2004, 3:46am
http://forum.folding-community.org/viewtopic.php?t=3045

csimon
4 Feb 2004, 3:47am
I looked, and perhaps I'm just missing it, but I don't find the beta sign-up area.
I have two 2.3GHz Athlons folding that freeze and it's extremely irritating.All you need to do is pm an admin and let him know you'd like to become a beta tester ...once they sign you in you'll see an extra category called "beta forums". I PMed pythagoras and I was signed up the next day when he got the message. Just go thru the rules when you get in and you'll be good to go.

edit: sorry marc didn't see your post. :thumbsup:

a2jfreak
4 Feb 2004, 4:13am
Thank you both.

csimon
4 Feb 2004, 4:32am
Good luck with it a2j ...I haven't tried it because I don't have the issue but hopefully this is just the workaround for you aside from 3dnow.

a2jfreak
4 Feb 2004, 4:21pm
I hope it works too, Simon. 3DNow! is much slower than SSE (the times I have turned off SSE WUs took over 2x as long to complete). The systems normally only seems to freeze every couple of weeks, but it has happened as quickly as a few hours. It's possible that not every freeze is related to the SSE bug, but I'm thinking it is the most likely culprit.

At first I thought it was a VCore issue so I kept upping the VCore. No difference. I purchased new 60+ CFM fans to help keep the chips running cooler w/ more VCore. No difference. Oh well, enough chit chat. :fold: :fold: :fold:

csimon
4 Feb 2004, 4:54pm
I spoke too soon. I put together an xp2000 yesterday and tried letting it run all night ...got here this morning to find that it had only gone one frame and locked dead as a door knob. The thing is is that it had folded about 17 frames just fine before I left. I will try the workaround myself.

a2jfreak
4 Feb 2004, 5:42pm
Maybe the fix is an Athlon64 4000+, along with a new mobo and 2GB of high-speed low-latency DDRII RAM . . . all paid for by AMD for sending back my AthlonXP 1700+. ;D

mmonnin
4 Feb 2004, 5:44pm
I'll send 2!

a2jfreak
4 Feb 2004, 6:11pm
I got 3 or 4 (can't recall now) 1800+ CPUs + RAM + Mobo back when Fry's was running a pretty decent deal a few months back. Never have gotten around to setting them up. Stuff happens, eh? I would love to have the time (and the space) to set them up and get them folding. If I could get those chips to OC, even if only a a couple hundred MHz each that would be some serious points. Maybe AMD will let me send back the 2 1700+s I have that run @ 2.3 and the 3 or 4 1800+s I have. I have a couple more 1700+s (DUT3C though) somewhere but I don't know where they are and I don't think I have enough spare boards for them. Oh man how I long for free time and free space. Forgive me. I'm WAY off topic and hijacking the thread.

/me kicks himself.

mmonnin
4 Feb 2004, 6:15pm
Stop posting and start formatting!!:)

edcentric
5 Feb 2004, 5:17pm
Well I am trying the fix,
Funny though, last night I was folding with SSE 4:45/% but it kept locking. I tried 3Dnow and my times went to 5:30, and it still locked. I finished the wu without either at 11:50.
I sure hope that this works.

csimon
5 Feb 2004, 5:40pm
Well I am trying the fix,
Funny though, last night I was folding with SSE 4:45/% but it kept locking. I tried 3Dnow and my times went to 5:30, and it still locked. I finished the wu without either at 11:50.
I sure hope that this works.
ed if that doesn't work state your basic spec like processor - overclock - vcore settings - temps. If you have good temps you may up the vcore a little if possible.
I have a KR7A-R w/ xp2000+@1750 and I can't get it to stop locking up unless I put the turbo fans and supercool it running 1.85 vcore. If I run stock clock or slight oc everything is fine and temps are cool.

edcentric
6 Feb 2004, 3:44pm
OK, locked again last night. I took voltage up and FSB down this morning.
1.735V (by MBM), 138x15 (on an XP2400)
temp (MBM) 39C (at stock speed and voltage it was 36C)
This is in my MSI mobo with SDRAM (2 sticks of PC150)

I'll know within couple of hours how this works. I have a little more voltage available.

csimon
7 Feb 2004, 5:46pm
I finally ended up doing a complete reinstall of winxp + via4n1's and that did the trick for me. All volts are stock again and fsb is 140 from 133 (where I've always run it stable) on my xp2000+ & kr7a. Temps are running high but that's because I turned the delta 60x38 fan way down so it would stop scaring the students ...also cheapass suxor 250w psu that came with enlight case!!! :fold:

edcentric
13 Feb 2004, 2:38pm
It ran three wu's just fine, then locked again last night.
I guess that I will stay with the lower FSB.
Some wu's are much harder on it than others.

muddocktor
13 Feb 2004, 2:47pm
There is a new solution for the locking issue with SSE for Gro work, which involves a new core version that is being beta tested right now. I found this out over at the community forums and I'm presently running it on all my AMD machines, but I just started putting the new core on them so I can't say personally if it solves the issue.

Anyways, here is the link for the new beta core to solve the lockup problem. http://www.stanford.edu/group/pandegroup/folding/beta-core/

Try this new core out if you are having lockup problems. It is also supposed to be just a smidgeon faster than the 1.55 core too, so you might even want to try it out on your Intel rigs too.:fold:

a2jfreak
13 Feb 2004, 2:55pm
I had a hard freeze a few days ago. Possibly unrelated to the new core. Yesterday the computer rebooted while I was using it. Since F@H freezes the system and doesn't reboot it I'm going to say the random reboot was completely unrelated to F@H. I upped my vCore by .075 volts so we'll see what will happen.

edcentric
13 Feb 2004, 4:18pm
I am running 1.56 on this box.
One thought is that the new core actually works the machine harder. An overclock that was compleetly stable before (3.24 and 1.54) doesn't cut it now.

t1rhino
13 Feb 2004, 4:36pm
IC1 experienced it's first lock the other day, which I suspect was due to the heavy overclocks and SSE. I lowered teh overclock for now, until the problem is solved.

qparadox
13 Feb 2004, 6:47pm
I have 4 machines currently locking up using SSE
1) XP 2000+ (this may be the board since its a kt133a board that supposedly doesn't support XP's) -locks on first frame
2) Duron 1.6 GHz @ 2.3 GHz - locks after 5-10 frames while below 40*C. 3 Days folding uptime with no SSE.
3) XP 1700+ @ 2 Ghz - locks periodically after 20-40 frames
4) Centrino 1.4 GHz (laptop probably a heat issue and its not amd duh)

I didn't notice the locking since all by my laptop are my brothers machines and they reset them before I get home. I'm trying that new core and will report results.

KingFish
13 Feb 2004, 9:50pm
I haven't had any problems with any machines locking up using -advmethods, -forcesse, and -forceasm. I don't have any overclocked machines and only run them at 95% instead of 100%. Just thought I'd throw that in for extra info.

KingFish

edcentric
13 Feb 2004, 10:49pm
qp, one of my boxes is a KT7A running a pally, it never locks. My KT7A running TBird never locks either. Both of these are oc'ed. The 266A mobo running a 2400TBred is the one that I am fighting with. It is oced. It had been at 2.1, now it is 2.07. We will see how it runs.
Temp isn't an issue on any of the boxes.

qparadox
16 Feb 2004, 8:30am
ok well everything except the centrino (which is definitely a heat issue) and the XP2000+ is folding merilly along. The 2000+ locks instantly when I try to enable SSE. There are some bulging caps on the board so I'm tempting to believe its a power / hardware problem not a software one.

mmonnin
16 Feb 2004, 12:19pm
I would think so as well. Those caps need to go before it takes more than just your motherboard.

muddocktor
16 Feb 2004, 1:58pm
Take that board out of service until you replace those caps, qp. That board might decide to take out some extra components. Those caps are definitely bad and need to be replaced pronto. If that KT133a board is a KT7-A, then you can send it back to Abit and they'll change the caps out for $25, which is a bargain as the caps themselves will cost you more than that to change all the big ones on the board.

mmonnin
19 Feb 2004, 5:43am
Unstickied since its not really a problem anymore. There are several other threads floating about as well.

qparadox
19 Feb 2004, 10:49am
I like living on the edge ;). The caps are actually for the AGP supply and do not regulate the CPU power at all (according to MSI's techs, they won't do crap about it and I'm not willing to start a small claims battle over it atm. IMHO they sold a defective product and are thus liable for repairing it even after the warranty is up). I'm using a PCI vid-card so it works out acceptably but I'm sure the caps screw with more than just the AGP supply. The board is slated for replacement ... sometime. Any recommendations for a solid board that's as cheap as possible and maybe with workable onboard ide-raid (only need mode 1).

profdlp
19 Feb 2004, 11:56am
..Any recommendations for a solid board that's as cheap as possible and maybe with workable onboard ide-raid (only need mode 1).
Try the
MSI KT6V-LSR (http://www.newegg.com/app/ViewProductDesc.asp?description=13-130-445&catalog=22&manufactory=BROWSE&depa=0).

$66 at Newegg. :thumbsup:

MSI KT600 Chipset Motherboard for AMD Socket A CPU, Model "KT6V-LSR"(MS-7021) -RETAIL

Specifications:
Supported CPU: AMD Athlon/Athlon XP/Duron Processors
Chipset: VIA KT600 + VT8237
FSB: 400/333/266/200MHz
RAM: 2x DIMM support DDR400/333/266/200 Max 2GB
IDE: 2x UltraDMA 133/100/66 up to 4 Devices
Slots: 1x AGP 8X, 5x PCI
Ports: 2xPS2,1xLPT,1xCOM,1xLAN,8xUSB2.0(Rear 4),RCA SPDIF Out,Audio Ports
Onboard Audio: Realtek ALC655 5.1-Channel Codec
Onboard LAN: 10/100Mbps Fast Ethernet
Onboard SATA/RAID: 2x Serial ATA, RAID 0/1
Form Factor: ATX