SMP Error
Garg
Purveyor of Lincoln Nightmares Icrontian
My SMP client has been returning "UNKNOWN ERROR" messages since the last WU finished. I'm leaving for work now, I guess I'll reinstall the client when I get home. Has anyone seen this before or know its cause?
[08:09:41] Completed 500000 out of 500000 steps (100 percent) [08:09:42] Writing final coordinates. [08:09:46] Past main M.D. loop [08:09:58] Will end MPI now [08:10:58] [08:10:58] Finished Work Unit: [08:10:59] - Reading up to 3714144 from "work/wudata_07.arc": Read 3714144 [08:10:59] - Reading up to 1768952 from "work/wudata_07.xtc": Read 1768952 [08:10:59] goefile size: 0 [08:10:59] logfile size: 17316 [08:10:59] Leaving Run [08:10:59] - Writing 5504812 bytes of core data to disk... [08:11:01] ... Done. [08:11:01] - Failed to delete work/wudata_07.sas [08:11:01] - Failed to delete work/wudata_07.goe [08:11:01] Warning: check for stray files [08:11:01] - Shutting down core [08:13:01] [08:13:01] Folding@home Core Shutdown: FINISHED_UNIT [08:13:01] [08:13:01] Folding@home Core Shutdown: FINISHED_UNIT [08:13:10] CoreStatus = 64 (100) [08:13:10] Sending work to server [08:13:10] + Attempting to send results [08:13:39] + Results successfully sent [08:13:39] Thank you for your contribution to Folding@Home. [08:13:39] + Number of Units Completed: 56 [08:15:43] - Preparing to get new work unit... [08:15:43] + Attempting to get work packet [08:15:43] - Connecting to assignment server [08:15:44] - Successful: assigned to (171.64.65.64). [08:15:44] + News From Folding@Home: Welcome to Folding@Home [08:15:44] Loaded queue successfully. [08:15:50] + Closed connections [08:15:50] [08:15:50] + Processing work unit [08:15:50] Core required: FahCore_a1.exe [08:15:50] Core found. [08:15:50] Working on Unit 08 [August 15 08:15:50] [08:15:50] + Working ... [08:15:50] [08:15:50] *------------------------------* [08:15:50] Folding@Home Gromacs SMP Core [08:15:50] Version 1.74 (March 10, 2007) [08:15:50] [08:15:50] Preparing to commence simulation [08:15:50] - Ensuring status. Please wait. [08:16:07] - Assembly optimizations manually forced on. [08:16:07] - Not checking prior termination. [08:16:14] - Expanded 929114 -> 11968368 (decompressed 1288.1 percent) [08:16:15] - Failed to delete work/wudata_08.ar [08:16:15] Project: 2610 (Run 1, Clone 84, Gen 0) [08:16:15] [08:16:15] ing from initial work packet [08:16:15] [08:16:15] Project: 2610 (Run 1, Clone 84, Gen 0) [08:16:15] [08:16:16] Assembly optimizations on if available. [08:16:16] Entering M.D. [08:16:22] Rejecting checkpoint [08:16:23] Gromacs error. [08:16:23] [08:16:23] Folding@home Core Shutdown: UNKNOWN_ERROR [08:16:23] [08:16:23] Folding@home Core Shutdown: UNKNOWN_ERROR
0
Comments
2610s have been among some of the tougher SM WUs to fold.
If you didn't know, after Ctrl+C, check Task Manager. If any Fah_Core1.exe are still running, you must wait. The shutdown process must synchronize all four Fah_Cores or it can destroy the work unit.
After the error that I posted, I Ctrl+C closed it, restarted the computer, and tried running it again. Same UNKNOWN ERROR.
Will do. That's what I did last time I needed to get rid of a WU, but I wasn't sure if that was the safest way. Thanks
If the console and/or log is not showing frame progression with time stamps, the work unit is stalled. There's almost no chance it can be recovered, at least in my experience.
Have you had any events lately that might have corrupted Windows system files and communications apps, such as .net Framework? How long has it been since you've done a Check Disk operation? That's probably not the problem, but it wouldn't hurt to run it. If that doesn't fix it, just wipe out the contents of the Folding client folder and re-install. I had the exact same problem a week ago.
The first diagnostics I performed were Check Disk, overclock stability testing, temperature monitoring, and hard drive diagnostics - Hitachi's Drive Fitness Test. Everything passed 100%. The system was rock stable with the hard drive passing in flying colors. I thought everything was OK, that stalled WUs were just anomalies. Subsequent downloaded WUs continued to hang and be ruined. After reading the known bugs thread at Folding Community I started suspecting the network connection (home network). One time I was observing the Task Manager and all four FAHCore_A1s disappeared at the same time that a "lost internet connection" window popped up on the desktop. After this incident there was no question. I uninstalled the D-Link wireless USB adapter and installed a Netgear wireless G card (just say no to both!). The Netgear card was even worse than the D-Link adapter! OK, enough is enough. I reconnected the computer via Ethernet cable nearly a week ago and the subject computer has successfully completed every SMP work unit downloaded since then.
BTW, I've had zero problems like this with the computers that are networked with Linksys PCI cards. Note: none of the Linksys cards have the dubious "speedboost" technology, which has not been getting good reviews. These are the conventional Linksys B-G cards. I returned that sorry Netgear for a refund.
BTW, I've ordered two of these MSI wireless G/B cards from Newegg. They get excellent user reviews and cost less than half of what Linksys and other competitors cost. Cross fingers - we'll see.
Netgear WG311 - 'just say no!' I purchased this at CompUSA during a lunch break from work. I should have checked out reviews first.
Pic of the MSI wireless B/G:
Ah, I should have checked the core temps. Normally I have SpeedFan running, but I didn't last time. I'll wipe out my install and try again tonight
I'm connected with an ethernet cable, so hopefully there aren't any connection issues (rather not get a new nic or router). Icrontic_11 is on a Gigabyte wireless PCI card, but it's folding regular WUs.
Also... they need to work out password issues. I CANNOT enter my domain password into a program to use my credentials. Not allowed at work. The three systems that I have tried all failed to complete.
No kidding - there's got to be a better way to do it than it needing access to credentials. It's been in beta a long time. I know their resources are limited, but I hope they get everything worked out soon and get it ready for general release.
That should get you back up..
Tips:
Be very careful about shutdowns. If you don't use Ctrl+C to shut the client down, you risk destroying the work unit. After Ctrl+C to shut down the client, open Task Manager and observe for the four Fah_Core A~s running in the background. These four cores must synchronize before ending. After they have disappeared from TM, it is safe to shut down your computer.
Networking. Some of the WinSMP units are very, very sensitive to network connections. If you are on a wireless network connection, ensure that all power saving settings for your wireless card/adapter are turned OFF, that the device is at full power all the time. Just a one or two-second network disconnect when on wireless can destroy a work unit. (I don't know why, but it's a fact.)
It is work unit 2610 (1,84,0), a gromacs error (get_symtab_handle 54650952 not found) ...src\gmxliblsymtab.c, line 108. It is a hard stop. no recovery.
I downloaded and re-installed folding-smp (after deleting the 'folding' folder) several times, got the same work unit resulting in the same error. I do not know how to delete a work unit. deleting the 'queue file' and the 'work folder' does not help.
By the way I have a two computers with a Q6600 (4 cores) - stock, not overclocked - each running a single version of Windows SMP. On the same cable connection - one is running fine (both have been running for fine for about 2-3 months)
Anyway I have been stuck for a couple of hours and I would like to get going again. Any help is greatly appreciated.
thank you
Otto1939
edit and welcome to Icrontic maybe a post of the first 30 lines of the fahlog with the verbosity 9 flag in place would help
I went thru separate installs. I don't know how to do anything else.
I also don't know how to run 'verbosity 9'.
Here is the log, showing the end of the download until the error:
[02:03:32] + 696320 bytes downloaded
[02:03:32] + 706560 bytes downloaded
[02:03:32] + 716800 bytes downloaded
[02:03:32] + 727040 bytes downloaded
[02:03:32] + 737280 bytes downloaded
[02:03:32] + 747520 bytes downloaded
[02:03:32] + 757760 bytes downloaded
[02:03:32] + 768000 bytes downloaded
[02:03:32] + 778240 bytes downloaded
[02:03:32] + 788480 bytes downloaded
[02:03:32] + 789667 bytes downloaded
[02:03:32] Verifying core Core_a1.fah...
[02:03:32] Signature is VALID
[02:03:32]
[02:03:32] Trying to unzip core FahCore_a1.exe
[02:03:32] Decompressed FahCore_a1.exe (2035712 bytes) successfully
[02:03:32] + Core successfully engaged
[02:03:37]
[02:03:37] + Processing work unit
[02:03:37] Core required: FahCore_a1.exe
[02:03:37] Core found.
[02:03:37] Working on Unit 01 [August 27 02:03:37]
[02:03:37] + Working ...
[02:03:37]
[02:03:37] *
*
[02:03:37] [EMAIL="Folding@Home"]Folding@Home[/EMAIL] Gromacs SMP Core
[02:03:37] Version 1.74 (March 10, 2007)
[02:03:37]
[02:03:37] Preparing to commence simulation
[02:03:37] - Ensuring status. Please wait.
[02:03:39] - Starting from initial work packet
[02:03:39]
[02:03:39] Project: 2610 (Run 1, Clone 84, Gen 0)
[02:03:39]
[02:03:39] Assembly optimizations on if available.
[02:03:39] Entering M.D.
[02:03:58] tial work pa- Starting from initial work packet
[02:03:58]
[02:03:58] Project: 2610 (Run 1, Clone 84, Gen 0)
[02:03:58]
[02:03:58] Entering M.D.
[02:04:04] Rejecting checkpoint
[02:04:05] Gromacs error.
[02:04:05]
[02:04:05] [EMAIL="Folding@home"]Folding@home[/EMAIL] Core Shutdown: UNKNOWN_ERROR
[02:04:05]
[02:04:05] [EMAIL="Folding@home"]Folding@home[/EMAIL] Core Shutdown: UNKNOWN_ERROR
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
--- Thank You ---
Program Core_A1.exe, VERSION 3.3
Source code file: ..\..\..\src\gmxlib\symtab.c, line: 108
Fatal error:
symtab get_symtab_handle 54650952 not found
Thanx for Using GROMACS - Have a Nice Day
wiki entries relating
http://fahwiki.net/index.php/How_do_I_reconfigure_the_console_client_options%3F
http://fahwiki.net/index.php/How_do_I_add_flags_using_a_shortcut_to_the_console_client%3F
I tried the verbosity 9 option: and got this... the program is downloaded repeatedly. This shows one iteration: I had to modify the links because I am not allowed to post links here yet.
[17:55:12] Initial: 316E; + 727040 bytes downloaded
[17:55:12] Initial: D89D; + 737280 bytes downloaded
[17:55:12] Initial: E6A3; + 747520 bytes downloaded
[17:55:12] Initial: B488; + 757760 bytes downloaded
[17:55:12] Initial: BAFD; + 768000 bytes downloaded
[17:55:12] Initial: 34A0; + 778240 bytes downloaded
[17:55:12] Initial: DD6C; + 788480 bytes downloaded
[17:55:12] Initial: D2E9; + 789667 bytes downloaded
[17:55:12] Verifying core Core_a1.fah...
[17:55:12] Signature is VALID
[17:55:12]
[17:55:12] Trying to unzip core FahCore_a1.exe
[17:55:13] Decompressed FahCore_a1.exe (2035712 bytes) successfully
[17:55:13] + Core successfully engaged
[17:55:18]
[17:55:18] + Processing work unit
[17:55:18] Core required: FahCore_a1.exe
[17:55:18] Core found.
[17:55:18] Working on Unit 01 [August 27 17:55:18]
[17:55:18] + Working ...
[17:55:18] - Calling 'mpiexec -channel auto -np 4 FahCore_a1.exe -dir work -suffix 01 -checkpoint 15 -verbose -lifeline 2212 -version 591'
[17:55:26] CoreStatus = 63 (99)
[17:55:26] + Error starting Folding Home core.
[17:55:31]
[17:55:31] + Processing work unit
[17:55:31] Core required: FahCore_a1.exe
[17:55:31] Core found.
[17:55:31] Working on Unit 01 [August 27 17:55:31]
[17:55:31] + Working ...
[17:55:31] - Calling 'mpiexec -channel auto -np 4 FahCore_a1.exe -dir work -suffix 01 -checkpoint 15 -verbose -lifeline 2212 -version 591'
[17:55:39] CoreStatus = 63 (99)
[17:55:39] + Error starting Folding Home core.
[17:55:44]
[17:55:44] + Processing work unit
[17:55:44] Core required: FahCore_a1.exe
[17:55:44] Core found.
[17:55:44] Working on Unit 01 [August 27 17:55:44]
[17:55:44] + Working ...
[17:55:44] - Calling 'mpiexec -channel auto -np 4 FahCore_a1.exe -dir work -suffix 01 -checkpoint 15 -verbose -lifeline 2212 -version 591'
[17:55:52] CoreStatus = 63 (99)
[17:55:52] + Error starting Folding Home core.
[17:55:52] - Attempting to download new core...
[17:55:52] + Downloading new core: FahCore_a1.exe
[17:55:52] Downloading core (~pande Win32 x86 Core_a1 fah from stanford)
[17:55:53] Initial: AFDE; + 10240 bytes downloaded
[17:55:53] Initial: AD21; + 20480 bytes downloaded
[17:55:53] Initial: CC38; + 30720 bytes downloaded
[17:55:53] Initial: 8501; + 40960 bytes downloaded
[17:55:53] Initial: F56A; + 51200 bytes downloaded
[17:55:53] Initial: ABAE; + 61440 bytes downloaded
[17:55:53] Initial: B6B0; + 71680 bytes downloaded
[17:55:53] Initial: 783A; + 81920 bytes downloaded
Changing the machine id from 1 to 2 did the trick.
It downloaded a different 2610 wu and started folding.
Many Thanks
Otto