Leo!

shwaipshwaip bluffin' with my muffin Icrontian
edited September 2003 in Folding@Home
It looks like one of your computers may be losing w/us part of the way through... 12 wu's for 6 pts :(
leo.jpg 20.2K

Comments

  • MrBillMrBill Missouri Member
    edited September 2003
    shwaip said
    It looks like one of your computers may be losing w/us part of the way through... 12 wu's for 6 pts :(
    :scratch: It's probably the INTEL! ;D
  • profdlpprofdlp The Holy City Of Westlake, Ohio
    edited September 2003
    I'm having the same problem. I've checked all of my local computers and it's not them. Looks like it's time for a road trip...

    :rolleyes2

    EDIT: It seems to me that I was seeing this in my stats before yesterday. Could there be a glitch at Stanford under-reporting points or over-reporting WU's?
    fah.jpg 14.3K
  • LincLinc Owner Detroit Icrontian
    edited September 2003
    It looks like too many WUs. Even 14 WUs on the one before shwaip referenced is way too low, and that number of WUs is way out of synch with the rest of the times.
  • edited September 2003
    Yeah Leo, check back through your fahlog and see if you are recording any special exits on the WU's. If so, then you might be borderline unstable on your overclock on one of your rigs. Those points totals are way too low for the amount of WU's shown turned in. I checked both points summary pages and nothing is lower than 5 or 6 points/WU.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    I've had a whole series of work assignments the last few days that would just start, stop, burp, and sputter. Both the CPU and virtual CPU (each running an instance of Folding) would fluctuate between 30 and 100% usage. When I had units get stuck - no progress indicated in the graphical monitor and in the log, and when the CPUs' usage indicator showed low usage, I would just dump the assignments and start over.

    While I was at it, I let the computer download new Folding cores. I noticed it on...Yes, the Intel machine. I don't know if this has been happening on other machines as well.

    I dropped the voltage on the CPU to see how the machine would behave and drop core temperatures. Don't remember if I set it back to where it was before. I'll have to check.

    Thanks for alerting me.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    Logfile_01.txt:

    "
    *
    *
    Folding@home Gromacs Core
    Version 1.49 (June 23, 2003)

    Preparing to commence simulation
    - Looking at optimizations...
    - Files status OK
    - Go method
    - Expanded 403822 -> 2483231 (decompressed 614.9 percent)

    Project: 542 (Run 5, Clone 91, Gen 14)

    Assembly optimizations on if available.
    Entering M.D.
    (Starting from checkpoint)
    Protein: p542_BBA5_ext

    Writing local files
    Completed 15000 out of 500000 steps (3%)
    Extra SSE boost OK.
    Quit 101 - Fatal error:
    Step 15866, time 31.732 (ps) LINCS WARNING
    relative constraint deviation after LINCS:
    max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0

    Simulation instability has been encountered. The run has entered a
    state from which no further progress can be made.
    If you often see other project units terminating early like this
    too, you may wish to check the stability of your computer (issues
    such as high temperature, overclocking, etc.).
    Going to send back what have done.
    logfile size: 17996
    - Writing 18672 bytes of core data to disk...
    ... Done."

    Son of a gun - the problems I described above are happening. I looked at the protein; it's P542_BBA5_ext - same stinking protein that fouled everything up before! None of the others do this. Time to wipe out the work and start over again. Sheesh!
    :banghead:
  • MrBillMrBill Missouri Member
    edited September 2003
    Leo: I was having that problem on my new NF7-S/xp2500 system with Gromacs. I took off the -advmethods flag and it folds tinkers just fine. :confused:

    I haven't had time to mess with it to see what the real problem is. I'm only home on weekends and I was tired of that machine not doing anything useful during the week.

    \\edit: I just checked that machine and it has completed a few Gromacs as well (without -advmethods). P362, P379 and P381. I'm not sure which Gromac(s) it was having trouble with....
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    It seems to only be the Gromacs 542 that causes this crap. I've checked the logs, and other Gromacs series such as 572 fold just fine. Borderline unstable on the overclock? Possible? Yes. Probable? Don't think so. There is nothing unstable on this machine except Gromac 542.

    I just uninstalled F@H on CPU 0 (the real, hardware CPU); reinstalled F@H, v3.24. Wouldn't you know it - it downloaded another 542! #@$#%^& !

    When I restarted the computer after uninstalling F@H in safe mode, I opened the BIOS and boosted CPU core voltage one notch. Perhaps that will help. We'll see.
    I took off the -advmethods flag and it folds tinkers just fine.

    Bill, that's not tagged on to either instance of F@H on this Intel system.
  • MrBillMrBill Missouri Member
    edited September 2003
    You can get Gromacs without -advmethods, but it's almost a guarentee with it.

    I was having problems whether the system was overclocked or NOT. 3.24, 3.25beta -forceasm, no -forceasm, it didn't make any difference. It may well be a problem with the 542. I'll try to keep a closer watch on mine to see if it throws up if I get a 542.

    It's been running fine @ 2075mhz for over a week now. *knock on wood*

    I didn't see any 542's in the log file.
  • edited September 2003
    I just checked EMIII and the rig I'm presently on (the Barton rig) has done 2 of these with no problems, using SSE. My P3S rig has done 3 of them with no problems. My Epox rig has done 4 of them with no problems evident in EMIII. My MSI KT133A board has also done 4 of these with no problems.

    Leo, with you just getting another 542, we'll see if the bump up in vcore helps stabilize the folding of these WU's on your Intel box. If it folds it OK, you might want to start running the -advmethods flag as the P4 will process Gromacs work much better than a Tinker WU. The weak native FPU unit on the P4 makes Tinker folding fairly slow on P4 machines.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    Well, as I said, after uninstalling/reinstalling, it downloaded another 542. MBM is showing 100% CPU (real CPU) utilization, but after 15 minutes, it has yet to finish a frame.
  • mmonninmmonnin Centreville, VA
    edited September 2003
    The error you got usually means instability. There are other errors that could mean several things but this one even says the computer is unstable.

    I havent heard of problems with SSE on P4s like there have been on the AMDs so its even more likely stability issues.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    Well, I can't figure out what else it could be, so perhaps there is an instability. But why would it only experience "fatal error"s on Gromac 542, and not the other Gromacs? Weird.

    I checked the log this morning. The computer, both real and virtual CPUs, experiences several Gromac 542 crashes. I'll lower the FSB a tad and see what happens next time 542 is assigned.

    Also, I'll be installing SLK-900U on the CPU tonight, should lower peak temperatures significantly.

    I'll check the logs when I get home this evening to see if any 542's have been downloaded. I'll report back if any news.
  • BDRBDR
    edited September 2003
    Leo, I had a problem recently with some massive WU crashes too, with the same error.

    I backed down the OC on that machine, deleted the FaH core, re-dl'ed a new core and haven't had any problems since then.
    I think it was a combination of heat and the OC. I'm not sure what protein was running at the time though.

    Btw, the machine was not an Intel.
    wus.jpg 18.7K
  • mmonninmmonnin Centreville, VA
    edited September 2003
    Wow 52 WUs in one update.

    Could be a problem with that particular WU that is causing it to crash. There have been such WUs.
  • TBonZTBonZ Ottawa, ON Icrontian
    edited September 2003
    mmonnin said
    Wow 52 WUs in one update.

    I would think you mean one daily update and not one 3 hr update.:eek2:

    I have had some problems in the last 2 weeks also, I noticed a WU on my main machine was failing at frame 37 consistently. So I deleted the core and relevant files and it ran well for a few days and then another WU kept failing at frame 94, deleted files and restarted, hopefully this process will not continue. :banghead:
  • TheLostSwedeTheLostSwede Trondheim, Norway Icrontian
    edited September 2003
    Leo,

    I had the same problem with the same wu a good while ago. I lowered the oc a bit, gave it higher vcore...nogo. I then disabled the cpu-enhance setting in the bios and it worked again.

    Can you also try this and see if it was a flick or the real truth?
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    Could be a problem with that particular WU that is causing it to crash. There have been such WUs.

    Marc, my friend, have you been reading my posts? I've only said at least two times now that the only protein causing problems is Gromacs 542. Yesterday night before bed, I lowered the CPU/FSB clock a little. No change. The log revealed this morning that one of the CPUs had downloaded 542 and it crashed at Frame 37. They either crash right away or at 30, or at 37.

    Mack, I'll disable enhance CPU and see if that does anything.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    I've been running the computer at stock speed (2800MHz) to attempt to rule out "system instability". No change with Gromacs, they still burp, fall apart, what have you.

    Additional steps:

    - lowered CPU core temp - now 52*C (on-chip measurement) at full load.

    - uninstalled completely both the console and graphical clients; downloaded fresh clients (3.25 beta), installed

    - ran Norton Disk Doctor (ScanDisk)

    No improvement whatsoever. Take a look at log comments -

    "...[13:49:16] Project: 542 (Run 0, Clone 291, Gen 5)
    [13:49:16]
    [13:49:16] Assembly optimizations on if available.
    [13:49:16] Entering M.D.
    [13:49:24] Protein: p542_BBA5_ext
    [13:49:24]
    [13:49:24] Writing local files
    [13:49:25] Extra SSE boost OK.
    [13:49:29] Writing local files
    [13:49:29] Completed 0 out of 500000 steps (0)
    [13:49:49] Quit 101 - Fatal error:
    [13:49:49] Step 255, time 0.51 (ps) LINCS WARNING
    [13:49:49] relative constraint deviation after LINCS:
    [13:49:49] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
    [13:49:49]
    [13:49:49] Simulation instability has been encountered. The run has entered a
    [13:49:49] state from which no further progress can be made....
    ...[13:50:17] Protein: p542_BBA5_ext
    [13:50:17]
    [13:50:17] Writing local files
    [13:50:17] Extra SSE boost OK.
    [13:50:21] Writing local files
    [13:50:21] Completed 0 out of 500000 steps (0)

    Folding@home Client Shutdown.
    ...[14:07:30] Project: 542 (Run 14, Clone 284, Gen 9)
    [14:07:30]
    [14:07:30] Assembly optimizations on if available.
    [14:07:30] Entering M.D.
    [14:07:39] Protein: p542_BBA5_ext
    [14:07:39]
    [14:07:39] Writing local files
    [14:07:39] Extra SSE boost OK.
    [14:07:43] Writing local files
    [14:07:43] Completed 0 out of 500000 steps (0)
    [14:11:25] Quit 101 - Fatal error:
    [14:11:25] Step 2602, time 5.204 (ps) LINCS WARNING
    [14:11:25] relative constraint deviation after LINCS:
    [14:11:25] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
    [14:11:25]
    [14:11:25] Simulation instability has been encountered. The run has entered a
    [14:11:25] state from which no further progress can be made.
    [14:11:25] If you often see other project units terminating early...
    ...[14:11:41] Folding@home Gromacs Core
    [14:11:41] Version 1.51 (September 25, 2003)
    [14:11:41]
    [14:11:41] Preparing to commence simulation
    [14:11:41] - Looking at optimizations...
    [14:11:41] - Created dyn
    [14:11:41] - Files status OK
    [14:11:41] - Go method
    [14:11:41] - Expanded 440170 -> 2220225 (decompressed 504.4 percent)
    [14:11:41] - Starting from initial work packet
    [14:11:41]
    [14:11:41] Project: 572 (Run 48, Clone 60, Gen 5)
    [14:11:41]
    [14:11:41] Assembly optimizations on if available.
    [14:11:41] Entering M.D.
    [14:11:48] Protein: p572_L939_K12M
    [14:11:48]
    [14:11:48] Writing local files
    [14:11:48] Extra SSE boost OK.
    [14:11:50] Writing local files
    [14:11:50] Completed 0 out of 500000 steps (0)
    [14:13:30] Quit 101 - Fatal error:
    [14:13:30] Step 661, time 1.322 (ps) LINCS WARNING
    [14:13:30] relative constraint deviation after LINCS:
    [14:13:30] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
    [14:13:30]
    [14:13:30] Simulation instability has been encountered. The run has entered a
    [14:13:30] state from which no further progress can be made...
    ...[14:23:08] Protein: p571_L939_K12M_nat
    [14:23:08]
    [14:23:08] Writing local files
    [14:23:11] Writing local files
    [14:23:11] Completed 0 out of 500000 steps (0)
    [14:26:06] Quit 101 - Fatal error:
    [14:26:06] Step 363, time 0.726 (ps) LINCS WARNING
    [14:26:06] relative constraint deviation after LINCS:
    [14:26:06] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
    [14:26:06]
    [14:26:06] Simulation instability has been encountered. The run has entered a
    [14:26:06] state from which no further progress can be made...
    [
    ...[14:26:49] Protein: p572_L939_K12M
    [14:26:49]
    [14:26:49] Writing local files
    [14:26:49] Extra SSE boost OK.
    [14:26:53] Writing local files
    [14:26:53] Completed 0 out of 500000 steps (0)
    [14:29:29] Quit 101 - Fatal error:
    [14:29:29] Step 984, time 1.968 (ps) LINCS WARNING
    [14:29:29] relative constraint deviation after LINCS:
    [14:29:29] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
    [14:29:29]
    [14:29:29] Simulation instability has been encountered. The run has entered a
    [14:29:29] state from which no further progress can be made...."

    WHAT IS GOING ON? If there is indeed "system instability" it shows up in nothing other than Gromacs in Folding. These logs extracts are culled from operations with the computer at default operating specifications. Temps are good. Registry is good - everything is good as far as I can tell.

    Man, there is a gold star if anyone can figure this out. If not, is there a flag I can tack on to the Folding start file that prohibits downloading of Gromacs?

    This is exasperating. The next step, if I can't get it straight with this, is to wipe out Folding again, to install an older client like 3.24 or 3.14.

    ARRRGGGHHHH!
    :banghead:
  • TheLostSwedeTheLostSwede Trondheim, Norway Icrontian
    edited September 2003
    Try remove the -advmethods flag Leo and you have a bigger chance getting tinkers.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    I don't have ANY flags on either console or graphical instances running. Never have had them running on the Intel's Folding clients.
  • TheLostSwedeTheLostSwede Trondheim, Norway Icrontian
    edited September 2003
    Weird. Even on Intel rigs, i use the flags.

    Thats not the point here though, we need to get your rig to fold.
    The core must be a boogie, no doubt.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    Most of the time when I get these Gromacs that won't complete - 542 most of time, I simply wipe out the Folding@Home folder except for the .exe, .cfg file, and the .html file. When I restart, the cores are freshly downloaded. I've been reading at Folding Community - apparently there are just a series of Gromacs that have been written as much for testing the Gromacs file as much as the protein it represents. Maybe I'm just "lucky" to keep downloading the buggy files. I have no reason to believe it's this computer.

    Any other suggestions would be appreciated, though. Thanks, guys.
  • edited September 2003
    wiggle the ram!! :fold:

    Test an older client right away before beating yourself up with this anymore. Also if you can save the 542 and transfer it across all your other machines to see if it messes up on them. Or attach a zip of it so we can try.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    I've got something for you to wiggle! :crazy:

    OK, seriously now, Severspehere, your latter suggestions weren't bad. If I get another crash and burn 542, I'll save it and graft it into the Folding client in my AMD rig.

    Yeah, I could zip it also and send it to someone.
  • mmonninmmonnin Centreville, VA
    edited September 2003
    Hmm since you keep getting this same WU, try deleted your user ID. Delete the config file and registry entry with user ID(not sure where this is). The servers do send out WUs to certain machines that they know are faster or slower. Thats why a certain computer might have a tendency to get certain WUs over others. So maybe if the server doesnt recognize you, you can get something different. Just a thought. Never heard of anyone trying this but it cant hurt.

    Zip that WU and I will give it a try on my machine. You can attach it for us to try.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2003
    All suggestions are on hold for the moment. The PSU went down today. Makes me wonder if erratic power rails could have been the problem. It just seems like unstable voltages would have caused problems in other areas. (I'm writing this post on my AMD rig.) Oh well, new PSU goes in tomorrow. We'll see if there's any difference.
  • ketoketo Occupied. Or is it preoccupied? Icrontian
    edited September 2003
    I (who as some of you may know am experiencing similar problems) have rock solid rails, Enermax 465VE. Possibly the fix for yours but not a problem for mine. :buck:

    New problem cropped up, details in my own sad saga thread lol
Sign In or Register to comment.