Unstable WUs?
Medlock
Miramar, Florida Member
I noticed earlier today my cpu was only running ~50% (HT P4) and so I thought it had finished a WU. I looked a little closer and it had only completed 60 frames, so I looked at the log, and...
[21:33:22] Completed 60000 out of 100000 steps (60)
[21:36:24] Timered checkpoint triggered.
.....
[21:54:37] Timered checkpoint triggered.
[21:56:44] Quit 101 - Fatal error:
[21:56:44] Step 60769, time 60.769 (ps) LINCS WARNING
[21:56:44] relative constraint deviation after LINCS:
[21:56:44] max 0.004592 (between atoms 45941 and 45943) rms 0.000040
[21:56:44]
[21:56:44] Simulation instability has been encountered. The run has entered a
[21:56:44] state from which no further progress can be made.
[21:56:44] If you often see other project units terminating early like this
[21:56:44] too, you may wish to check the stability of your computer (issues
[21:56:44] such as high temperature, overclocking, etc.).
[21:56:44] Going to send back what have done.
[21:56:44] logfile size: 68498
[21:56:44] - Writing 69182 bytes of core data to disk...
[21:56:44] ... Done.
[21:56:45]
[21:56:45] Folding@home Core Shutdown: EARLY_UNIT_END
[21:56:49] CoreStatus = 72 (114)
[21:56:49] Sending work to server
I'm not overclocking or anything. My ram timings are only slightly lower than SPD. 2-3-3-6 and stock is 2.5-3-3-7. It was a large WU so I'm thinking maybe it was my ram. But the only time a WU ever crashed on me is if I OC'd too far. I'm not now, so now I think it's the WU. BTW the WU is p130_1RYP_AAAA_UM, one of the large gromacs.
EDIT: That same client has recieved another one. It goes for 139 points. If it crashes I'll post again...
[21:33:22] Completed 60000 out of 100000 steps (60)
[21:36:24] Timered checkpoint triggered.
.....
[21:54:37] Timered checkpoint triggered.
[21:56:44] Quit 101 - Fatal error:
[21:56:44] Step 60769, time 60.769 (ps) LINCS WARNING
[21:56:44] relative constraint deviation after LINCS:
[21:56:44] max 0.004592 (between atoms 45941 and 45943) rms 0.000040
[21:56:44]
[21:56:44] Simulation instability has been encountered. The run has entered a
[21:56:44] state from which no further progress can be made.
[21:56:44] If you often see other project units terminating early like this
[21:56:44] too, you may wish to check the stability of your computer (issues
[21:56:44] such as high temperature, overclocking, etc.).
[21:56:44] Going to send back what have done.
[21:56:44] logfile size: 68498
[21:56:44] - Writing 69182 bytes of core data to disk...
[21:56:44] ... Done.
[21:56:45]
[21:56:45] Folding@home Core Shutdown: EARLY_UNIT_END
[21:56:49] CoreStatus = 72 (114)
[21:56:49] Sending work to server
I'm not overclocking or anything. My ram timings are only slightly lower than SPD. 2-3-3-6 and stock is 2.5-3-3-7. It was a large WU so I'm thinking maybe it was my ram. But the only time a WU ever crashed on me is if I OC'd too far. I'm not now, so now I think it's the WU. BTW the WU is p130_1RYP_AAAA_UM, one of the large gromacs.
EDIT: That same client has recieved another one. It goes for 139 points. If it crashes I'll post again...
0
Comments
Folding@Home News/weblog
8/20/2004 New projects: P130x
We have some new exciting projects just being released. They are unlike any project we've done before and "break" some of the normal FAH rules:
They're a lot bigger than the normal FAH WUs in terms of the RAM they take (hundreds of MB) and the net transfer (~5MB). To keep them only in the hands of those who have the resources for them, they require the "big WU" switch in the v5 client, enough RAM, and enough netbandwidth.
Since they take more resources, there is are bonus points associated with them (right now, a 50% bonus over the standard benchmark value). This value may increase or change if needed.
One other way that these WUs are different is that they are more likely to EARLY_END. Don't be surprised if this happens and don't worry: they are still scientifically important and clients get partial credit for the fraction completed.
This is new ground for FAH, which is scientifically very exciting. I hope to have some important new results to report in January, once these WUs have run for a few months.
KF