FAH Early Unit End

nonstop301nonstop301 51° 27' 24.87" N // 0° 11' 38.91" W Member
edited January 2007 in Folding@Home
I'm running into some trouble when I receive assignments that involve p1499 or p3039.

With p1499 I get a cycle of the following error code :

[06:57:09] Loaded queue successfully.
[06:57:09] + Benchmarking ...
[06:57:11]
[06:57:11] + Processing work unit
[06:57:11] Core required: FahCore_78.exe
[06:57:11] Core found.
[06:57:11] Working on Unit 02 [January 19 06:57:11]
[06:57:11] + Working ...
[06:57:11]
[06:57:11] *
*
[06:57:11] Folding@Home Gromacs Core
[06:57:11] Version 1.90 (March 8, 2006)
[06:57:11]
[06:57:11] Preparing to commence simulation
[06:57:11] - Looking at optimizations...
[06:57:11] - Files status OK
[06:57:13] - Expanded 857201 -> 12361133 (decompressed 1442.0 percent)
[06:57:13]
[06:57:13] Project: 1499 (Run 598, Clone 0, Gen 8)
[06:57:13]
[06:57:14] Assembly optimizations on if available.
[06:57:14] Entering M.D.
[06:57:21] Protein: p1499_tet_1499
[06:57:21]
[06:57:21] Writing local files
[06:57:21] Gromacs error.
[06:57:21]
[06:57:21] Folding@home Core Shutdown: UNKNOWN_ERROR
[06:57:25] CoreStatus = 79 (121)
[06:57:25] Client-core communications error: ERROR 0x79
[06:57:25] Deleting current work unit & continuing...
[06:57:45] - Preparing to get new work unit...
[06:57:45] + Attempting to get work packet
[06:57:45] - Connecting to assignment server
[06:57:46] - Successful: assigned to (171.64.122.134).
[06:57:46] + News From Folding@Home: Welcome to Folding@Home
[06:57:46] Loaded queue successfully.
[06:58:01] + Closed connections
[06:58:06]
[06:58:06] + Processing work unit
[06:58:06] Core required: FahCore_78.exe
[06:58:06] Core found.
[06:58:06] Working on Unit 03 [January 19 06:58:06]
[06:58:06] + Working ...
[06:58:06]
[06:58:06] *
*
[06:58:06] Folding@Home Gromacs Core
[06:58:06] Version 1.90 (March 8, 2006)
[06:58:06]
[06:58:06] Preparing to commence simulation
[06:58:06] - Looking at optimizations...
[06:58:06] - Created dyn
[06:58:06] - Files status OK
[06:58:09] - Expanded 857201 -> 12361133 (decompressed 1442.0 percent)
[06:58:09] - Starting from initial work packet
[06:58:09]
[06:58:09] Project: 1499 (Run 598, Clone 0, Gen 8)
[06:58:09]
[06:58:09] Assembly optimizations on if available.
[06:58:09] Entering M.D.
[06:58:16] Protein: p1499_tet_1499
[06:58:16]
[06:58:16] Writing local files
[06:58:16] Gromacs error.
[06:58:16]
[06:58:16] Folding@home Core Shutdown: UNKNOWN_ERROR
[06:58:20] CoreStatus = 79 (121)
[06:58:20] Client-core communications error: ERROR 0x79
[06:58:20] Deleting current work unit & continuing...



And with p3039 I received the following after 20 steps into the Work Unit



[17:25:00] Completed 1000000 out of 5000000 steps (20)
[17:37:48] Quit 101 - Fatal error: NaN detected: (ener[13])
[17:37:48]
[17:37:48] Simulation instability has been encountered. The run has entered a
[17:37:48] state from which no further progress can be made.
[17:37:48] This may be the correct result of the simulation, however if you
[17:37:48] often see other project units terminating early like this
[17:37:48] too, you may wish to check the stability of your computer (issues
[17:37:48] such as high temperature, overclocking, etc.).
[17:37:48] Going to send back what have done.
[17:37:48] logfile size: 32292
[17:37:48] - Writing 32855 bytes of core data to disk...
[17:37:48] ... Done.
[17:37:48]
[17:37:48] Folding@home Core Shutdown: EARLY_UNIT_END
[17:37:53] CoreStatus = 72 (114)
[17:37:53] Sending work to server


I have no idea why these errors occur but I did complete a p2125 assignment in between and now I received another p2125.

I have overclocked the processor but my first impression was that this could only have a positive effect when it comes to Folding At Home :)

I have also tested the RAM with MemTest and there were no errors after allowing it to run for over 3 hours.

From the 9 Folding assignments I have completed thus far the only the two that involved the p1499 and the p3039 produced these errors. I'm thinking it's a case of further configuring the FAH console so that such errors are avoided but I'm not sure what modifications are necessary if any.

If you have any suggestions I would be most grateful.

Many thanks in advance for your help.

Comments

  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited January 2007
    I have overclocked the processor but my first impression was that this could only have a positive effect when it comes to Folding At Home
    The 1499s been available long enough now that they should be uniformly stable work units. They are not beta units. Your overclock may not be as solid as you thought. It is possible that you downloaded a defective work unit, but not probable. BTW the 149X units are much more demanding of the computer than the 212x units.
  • airbornflghtairbornflght Houston, TX Icrontian
    edited January 2007
    Yeh, back off on the oc a little bit and see if it helps stability. You might also be able to bump the voltage a little bit and get stable. Run prime and see how long it lasts.
  • profdlpprofdlp The Holy City Of Westlake, Ohio
    edited January 2007
    All good advice given above. :)

    To (try and) add to it, when I have a machine encounter multiple errors I generally delete the core. It is located in the folder you are running FAH from and is in a format similar to FahCore_##.exe. Shut the program down, delete the core(s), then when you restart it will automatically download a new one. :fold:
  • nonstop301nonstop301 51° 27' 24.87" N // 0° 11' 38.91" W Member
    edited January 2007
    Thank you for your suggestions and I will lower the overclock settings slightly.

    Do you think it's a matter of relaxing the memory timings alone or also reducing the FSB frequency ?

    The current overclock was tested with Prime before I began Folding At Home and it didn't produce any errors over a 10 hour period but I leave the Folding console running 24/7 so that might be pushing it to the limits when I receive the p1499s or p3039s.

    I'll change the FAH Core as well although I did try that when the p1499 error occured and then I got a subsequent error when it started on the p3039 :)
  • profdlpprofdlp The Holy City Of Westlake, Ohio
    edited January 2007
    Overclocking and FAH can be odd at times. I have a rig here which ran fine with a substantial OC for about a month, then all of a sudden started dumping WU's right and left. I never did track it down to a specific protein or even type of protein, and I know it wasn't a matter of heat creeping into the picture as a problem. A slight reduction in the OC and all has been fine for several weeks since.
  • nonstop301nonstop301 51° 27' 24.87" N // 0° 11' 38.91" W Member
    edited January 2007
    Thanks again for your comments Prof :)

    Heating isn't an issue with me either and the CPU and motherboard temperatures are well within the normal range.

    I will reduce the FSB frequency and loosen the memory timings slightly and see what effect it will have on any future more demanding Work Units I receive. I did notice that both the p1499 and p3039 required more RAM to carry out their respective tasks. Both the CPU and the RAM are overclocked on this computer at the moment.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited January 2007
    About two years ago, before I was monitoring FAH work units as well as I do now, I noticed that my points accumulation wasn't what seemed it should be. I made a couple superficial client log checks and saw couple early work unit ends, spanned over several clients. I just chalked it up to faulty work unit downloads. Later, having learned that corrupted WU downloads are rather uncommon, I took a detailed look at the logs. I was shocked to find out how many botched work units my machines had experienced. Yes, my machines were overclocked then (but not the same setups as now). I lowered the clocks on each machine just a little, like 100MHz, and the EUEs did not recur. If the overclock is not 100%, 24/7 stable, F@H will find it, maybe not in the first day, week, or even month, but the instability will show in the result of ruined work units.
  • nonstop301nonstop301 51° 27' 24.87" N // 0° 11' 38.91" W Member
    edited January 2007
    Leonardo wrote:
    If the overclock is not 100%, 24/7 stable, F@H will find it, maybe not in the first day, week, or even month, but the instability will show in the result of ruined work units.

    This seems to be the case with me at the moment Leonardo :)

    I will leave the current p2124 task to complete and then I'll lower the FSB frequency by a few MHz and hopefully the FAH Core will not run into any problems with respect to the overclock.

    Thanks again for all the valuable information you provide :thumbup :)
  • QCHQCH Ancient Guru Chicago Area - USA Icrontian
    edited January 2007
    Another example of Team Short-Media helping each other out!!! :headbange
Sign In or Register to comment.