Leo!
shwaip
bluffin' with my muffin Icrontian
It looks like one of your computers may be losing w/us part of the way through... 12 wu's for 6 pts
0
Comments
:rolleyes2
EDIT: It seems to me that I was seeing this in my stats before yesterday. Could there be a glitch at Stanford under-reporting points or over-reporting WU's?
While I was at it, I let the computer download new Folding cores. I noticed it on...Yes, the Intel machine. I don't know if this has been happening on other machines as well.
I dropped the voltage on the CPU to see how the machine would behave and drop core temperatures. Don't remember if I set it back to where it was before. I'll have to check.
Thanks for alerting me.
"
*
*
Folding@home Gromacs Core
Version 1.49 (June 23, 2003)
Preparing to commence simulation
- Looking at optimizations...
- Files status OK
- Go method
- Expanded 403822 -> 2483231 (decompressed 614.9 percent)
Project: 542 (Run 5, Clone 91, Gen 14)
Assembly optimizations on if available.
Entering M.D.
(Starting from checkpoint)
Protein: p542_BBA5_ext
Writing local files
Completed 15000 out of 500000 steps (3%)
Extra SSE boost OK.
Quit 101 - Fatal error:
Step 15866, time 31.732 (ps) LINCS WARNING
relative constraint deviation after LINCS:
max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
Simulation instability has been encountered. The run has entered a
state from which no further progress can be made.
If you often see other project units terminating early like this
too, you may wish to check the stability of your computer (issues
such as high temperature, overclocking, etc.).
Going to send back what have done.
logfile size: 17996
- Writing 18672 bytes of core data to disk...
... Done."
Son of a gun - the problems I described above are happening. I looked at the protein; it's P542_BBA5_ext - same stinking protein that fouled everything up before! None of the others do this. Time to wipe out the work and start over again. Sheesh!
:banghead:
I haven't had time to mess with it to see what the real problem is. I'm only home on weekends and I was tired of that machine not doing anything useful during the week.
\\edit: I just checked that machine and it has completed a few Gromacs as well (without -advmethods). P362, P379 and P381. I'm not sure which Gromac(s) it was having trouble with....
I just uninstalled F@H on CPU 0 (the real, hardware CPU); reinstalled F@H, v3.24. Wouldn't you know it - it downloaded another 542! #@$#%^& !
When I restarted the computer after uninstalling F@H in safe mode, I opened the BIOS and boosted CPU core voltage one notch. Perhaps that will help. We'll see.
Bill, that's not tagged on to either instance of F@H on this Intel system.
I was having problems whether the system was overclocked or NOT. 3.24, 3.25beta -forceasm, no -forceasm, it didn't make any difference. It may well be a problem with the 542. I'll try to keep a closer watch on mine to see if it throws up if I get a 542.
It's been running fine @ 2075mhz for over a week now. *knock on wood*
I didn't see any 542's in the log file.
Leo, with you just getting another 542, we'll see if the bump up in vcore helps stabilize the folding of these WU's on your Intel box. If it folds it OK, you might want to start running the -advmethods flag as the P4 will process Gromacs work much better than a Tinker WU. The weak native FPU unit on the P4 makes Tinker folding fairly slow on P4 machines.
I havent heard of problems with SSE on P4s like there have been on the AMDs so its even more likely stability issues.
I checked the log this morning. The computer, both real and virtual CPUs, experiences several Gromac 542 crashes. I'll lower the FSB a tad and see what happens next time 542 is assigned.
Also, I'll be installing SLK-900U on the CPU tonight, should lower peak temperatures significantly.
I'll check the logs when I get home this evening to see if any 542's have been downloaded. I'll report back if any news.
I backed down the OC on that machine, deleted the FaH core, re-dl'ed a new core and haven't had any problems since then.
I think it was a combination of heat and the OC. I'm not sure what protein was running at the time though.
Btw, the machine was not an Intel.
Could be a problem with that particular WU that is causing it to crash. There have been such WUs.
I would think you mean one daily update and not one 3 hr update.
I have had some problems in the last 2 weeks also, I noticed a WU on my main machine was failing at frame 37 consistently. So I deleted the core and relevant files and it ran well for a few days and then another WU kept failing at frame 94, deleted files and restarted, hopefully this process will not continue. :banghead:
I had the same problem with the same wu a good while ago. I lowered the oc a bit, gave it higher vcore...nogo. I then disabled the cpu-enhance setting in the bios and it worked again.
Can you also try this and see if it was a flick or the real truth?
Marc, my friend, have you been reading my posts? I've only said at least two times now that the only protein causing problems is Gromacs 542. Yesterday night before bed, I lowered the CPU/FSB clock a little. No change. The log revealed this morning that one of the CPUs had downloaded 542 and it crashed at Frame 37. They either crash right away or at 30, or at 37.
Mack, I'll disable enhance CPU and see if that does anything.
Additional steps:
- lowered CPU core temp - now 52*C (on-chip measurement) at full load.
- uninstalled completely both the console and graphical clients; downloaded fresh clients (3.25 beta), installed
- ran Norton Disk Doctor (ScanDisk)
No improvement whatsoever. Take a look at log comments -
"...[13:49:16] Project: 542 (Run 0, Clone 291, Gen 5)
[13:49:16]
[13:49:16] Assembly optimizations on if available.
[13:49:16] Entering M.D.
[13:49:24] Protein: p542_BBA5_ext
[13:49:24]
[13:49:24] Writing local files
[13:49:25] Extra SSE boost OK.
[13:49:29] Writing local files
[13:49:29] Completed 0 out of 500000 steps (0)
[13:49:49] Quit 101 - Fatal error:
[13:49:49] Step 255, time 0.51 (ps) LINCS WARNING
[13:49:49] relative constraint deviation after LINCS:
[13:49:49] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
[13:49:49]
[13:49:49] Simulation instability has been encountered. The run has entered a
[13:49:49] state from which no further progress can be made....
...[13:50:17] Protein: p542_BBA5_ext
[13:50:17]
[13:50:17] Writing local files
[13:50:17] Extra SSE boost OK.
[13:50:21] Writing local files
[13:50:21] Completed 0 out of 500000 steps (0)
Folding@home Client Shutdown.
...[14:07:30] Project: 542 (Run 14, Clone 284, Gen 9)
[14:07:30]
[14:07:30] Assembly optimizations on if available.
[14:07:30] Entering M.D.
[14:07:39] Protein: p542_BBA5_ext
[14:07:39]
[14:07:39] Writing local files
[14:07:39] Extra SSE boost OK.
[14:07:43] Writing local files
[14:07:43] Completed 0 out of 500000 steps (0)
[14:11:25] Quit 101 - Fatal error:
[14:11:25] Step 2602, time 5.204 (ps) LINCS WARNING
[14:11:25] relative constraint deviation after LINCS:
[14:11:25] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
[14:11:25]
[14:11:25] Simulation instability has been encountered. The run has entered a
[14:11:25] state from which no further progress can be made.
[14:11:25] If you often see other project units terminating early...
...[14:11:41] Folding@home Gromacs Core
[14:11:41] Version 1.51 (September 25, 2003)
[14:11:41]
[14:11:41] Preparing to commence simulation
[14:11:41] - Looking at optimizations...
[14:11:41] - Created dyn
[14:11:41] - Files status OK
[14:11:41] - Go method
[14:11:41] - Expanded 440170 -> 2220225 (decompressed 504.4 percent)
[14:11:41] - Starting from initial work packet
[14:11:41]
[14:11:41] Project: 572 (Run 48, Clone 60, Gen 5)
[14:11:41]
[14:11:41] Assembly optimizations on if available.
[14:11:41] Entering M.D.
[14:11:48] Protein: p572_L939_K12M
[14:11:48]
[14:11:48] Writing local files
[14:11:48] Extra SSE boost OK.
[14:11:50] Writing local files
[14:11:50] Completed 0 out of 500000 steps (0)
[14:13:30] Quit 101 - Fatal error:
[14:13:30] Step 661, time 1.322 (ps) LINCS WARNING
[14:13:30] relative constraint deviation after LINCS:
[14:13:30] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
[14:13:30]
[14:13:30] Simulation instability has been encountered. The run has entered a
[14:13:30] state from which no further progress can be made...
...[14:23:08] Protein: p571_L939_K12M_nat
[14:23:08]
[14:23:08] Writing local files
[14:23:11] Writing local files
[14:23:11] Completed 0 out of 500000 steps (0)
[14:26:06] Quit 101 - Fatal error:
[14:26:06] Step 363, time 0.726 (ps) LINCS WARNING
[14:26:06] relative constraint deviation after LINCS:
[14:26:06] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
[14:26:06]
[14:26:06] Simulation instability has been encountered. The run has entered a
[14:26:06] state from which no further progress can be made...
[
...[14:26:49] Protein: p572_L939_K12M
[14:26:49]
[14:26:49] Writing local files
[14:26:49] Extra SSE boost OK.
[14:26:53] Writing local files
[14:26:53] Completed 0 out of 500000 steps (0)
[14:29:29] Quit 101 - Fatal error:
[14:29:29] Step 984, time 1.968 (ps) LINCS WARNING
[14:29:29] relative constraint deviation after LINCS:
[14:29:29] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
[14:29:29]
[14:29:29] Simulation instability has been encountered. The run has entered a
[14:29:29] state from which no further progress can be made...."
WHAT IS GOING ON? If there is indeed "system instability" it shows up in nothing other than Gromacs in Folding. These logs extracts are culled from operations with the computer at default operating specifications. Temps are good. Registry is good - everything is good as far as I can tell.
Man, there is a gold star if anyone can figure this out. If not, is there a flag I can tack on to the Folding start file that prohibits downloading of Gromacs?
This is exasperating. The next step, if I can't get it straight with this, is to wipe out Folding again, to install an older client like 3.24 or 3.14.
ARRRGGGHHHH!
:banghead:
Thats not the point here though, we need to get your rig to fold.
The core must be a boogie, no doubt.
Any other suggestions would be appreciated, though. Thanks, guys.
Test an older client right away before beating yourself up with this anymore. Also if you can save the 542 and transfer it across all your other machines to see if it messes up on them. Or attach a zip of it so we can try.
OK, seriously now, Severspehere, your latter suggestions weren't bad. If I get another crash and burn 542, I'll save it and graft it into the Folding client in my AMD rig.
Yeah, I could zip it also and send it to someone.
Zip that WU and I will give it a try on my machine. You can attach it for us to try.
New problem cropped up, details in my own sad saga thread lol