Problem WUs, multiple CPUs, multiple O\Ss

Straight_ManStraight_Man Geeky, in my own wayNaples, FL Icrontian
edited May 2004 in Folding@Home
Prescott does not like:

p638 (failure typically in the 55-65 percent range, varies, 8-9 out of ten fail)
p909 (failure in the high 30% to low 40% range, varies, completes less than 10% of WU attempts)
p910 (my previous Barton had a lot of proportionate partial completes of this also, variable ends from 30% to 80% complete, and a few completes)

Prescott is NOT Oc'd.

Northwood does not like:

p638 (70% failures, 30% completes)
p910 (90% failures, completion % at failure spread so wide as to not be statistically valid for inclusion)
p911s complete mostly on this CPU, not enough done of this project to get a stat curve that is valid as to a valid percentage for failure or WU completion.

Northwood IS OC'd.

Two different O\Ss in play with this, consoles, symptom is spontaneous general purpose core failure 74 (112) --I am not usually getting LINCS failures with these WUs (less than 3% LINCS among all three combined), just general failures or core throwing up its hands and saying it cannot continue with a generic message, Gromacs cannot continue. Multiple failures each, multiple runs, gens, and individual WU parms. Other projects fold well.

I think Folding has something going with these, but unless it is a fold prevention scenario it is a deadend or too complex a WU for P4 gen CPUs above 2.66 GHz and a Barton 2500+. Of the projects mentioned, I have 20-30 attempts each except on Northwood where noted, and have had Barton failures on these projects also. All these show a divergence pattern in random looks at them in a graphical console.

When I get time, will troll through my logs, and give Folding data, but so far these three WUs are yielding spontaneoius core abends way too oftne and others complete fine. I will note, however, that I am getting partial points for the partial turnins of these WUs.

No other system symptoms or changes seem to affect this abending.
Failures occur when almost nothing else is runing, and when other things are running. Commonality is use of stock core, not betas.

Only slight pattern relevant to hardware so far, is happens more when CPU in question is COOLER, amongst the three WUs mentioned. Other than those three, 90+ completions with the settings I use across BOTH O\S console clients, same corer versions used by both console subvars by O\S.

This is one reason I wanted Beta Access, no response yet to request for same in Folding's forums but will wait a week to ten days or so before talking to admins directly as Folding admins are apparently VERY busy.

So, is anyone else getting failures out of proportion to other WU fails by a huge amount with any or all of these WUs????

John D.

Comments

  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited May 2004
    RAM can affect work unit completion as well. The Geil in my AMD system has caused no problems whatsoever; but in the Intel system with the posted clock, it caused a number of failures. Sorry, I didn't catalogue anything for trends. Yes, I know it was the RAM, because as soon as I replaced it with the HyperX, I had no more failed units. There were several times when I reduced the clock to see if it was a CPU problem. Even at default clock (14 X 200), there were proteins that would crash.
Sign In or Register to comment.