Recovering Lost F@H Data

comfortablecomfortable Sugarland, TX
edited October 2004 in Folding@Home
Hi guys,

Recently, I decided to push my cpu further in terms of o/c. I was finally able to get it running stable @ 220x11.5 :41' load (a 2600-M,) but the journey was a sad one :bawling:

I used F@H as the stability checker- running it 24/7, and I didn't operate the computer as much as I usually did. During one of my fsb-increases, I noticed that the F@H service crashed many times. To further compound the problem, my ISP was going through some sort of failure; I had lost internet connectivity for an extended period of time.

Here are my F@H specs:
FAH502-Console.exe -svcstart -advmethods -forcesse:
[settings]
username=comfortable
team=93
asknet=no
bigpackets=yes
machineid=1
local=6

[http]
active=no
host=localhost
port=8080
usereg=no

[core]
checkpoint=30
cpuusage=100
ignoredeadlines=yes

Overnight, it seems as though my computer removed the large w.u that it was 67% through (which incidentally provided 320 points,) and encountered the error you'll see attached below. Upon getting a stable o/c and a dependable internet link, my computer was assigned with a different w.u. My f@h folder contains Fahcore's 65, 78, and 82.exe.

Here's my F@H log:
Quit 101 - Fatal error:
[8:47:10] Step 658, time 1.316 (ps) LINCS WARNING
[8:47:10] relative constraint deviation after LINCS:
[8:47:10] max 0.000000 (between atoms 1 and 2) rms 1.#QNAN0
[8:47:10]
[8:47:10] Simulation instability has been encountered. The run has entered a
[8:47:10] state from which no further progress can be made.
[8:47:10] If you often see other project units terminating early like this
[8:47:10] too, you may wish to check the stability of your computer (issues
[8:47:10] such as high temperature, overclocking, etc.).
[8:47:10] Going to send back what have done.
[8:47:10] logfile size: 8131
[8:47:10] - Writing 8804 bytes of core data to disk...
[8:47:10] ... Done.
[8:47:10]
[8:47:10] Folding@home Core Shutdown: EARLY_UNIT_END
[8:47:13] CoreStatus = 72 (114)
[8:47:13] Sending work to server

[8:47:13] + Attempting to send results
[8:47:14] + Results successfully sent
[8:47:14] Thank you for your contribution to Folding@Home.

[10:33:02] + Attempting to get work packet
[10:33:02] - Connecting to assignment server
[10:33:02] + Could not connect to Assignment Server
[10:33:02] + Could not connect to Assignment Server 2
[10:33:02] + Couldn't get work instructions.
[10:33:02] - Error: Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
ad infinitum.

Ultimately, my question would be: How do I prevent this from happening in the future. Is there any way that I can restore jobs that have been neglected by internet/stability issues?

Comments

  • MedlockMedlock Miramar, Florida Member
    edited October 2004
    The only real answer to that is... Don't get too greedy when overclocking. I know it sucks when that happens. I've had it happen a few too many times. But at least you get partial credit for what you've completed.

    No, you can't resume work that has been corrupted from instability, because...
    The run has entered a state from which no further progress can be made.

    When the internet goes down, F@H keeps completed WU data in queue so that when the connection is restored it can send the data back as normal. If the connection stays out, it will continue to try and get a new WU until it finally does get one. It will repeat this error...
    [10:33:02] + Attempting to get work packet
    [10:33:02] - Connecting to assignment server
    [10:33:02] + Could not connect to Assignment Server
    [10:33:02] + Could not connect to Assignment Server 2
    [10:33:02] + Couldn't get work instructions.
    [10:33:02] - Error: Attempt #1 to get work failed, and no other work to do.
    Waiting before retry.
  • mmonninmmonnin Centreville, VA
    edited October 2004
    You do get partial credit so you just didnt lose everything you have done. The large WUs have a higher tendency to error out like the one you got. It may be because of your OC or just a bad WU, which would not be your fault.

    If its 41C thats not a bad temp at all. What is your vcore and vdimm? Raising those might add some stability to your computer. You have headroom with the temps.
  • comfortablecomfortable Sugarland, TX
    edited October 2004
    1.725 -vcore
    2.8 - vdimm

    My computer is stable now. I've run prime95 and the usual stress tests. I had a whole bunch of problems during o/c testing, but it seems to be doing fine right now.

    In restrospect, I should've disabled F@H service during my o/c tests. There were plenty of other tests available, so it was kind of foolish for me to be experimenting with o/c values while folding. I'm burning-in the cpu right now with 24/7 f@h.
Sign In or Register to comment.