Vista x64 and SMP
Snarkasm
Madison, WI Icrontian
You guys know my setup - GA-P35-DS4, Q6600, 4 gigs OCZ, etc.
For the life of me, no matter what I try lately, I cannot get the SMP client to retain my position when I reboot. It comes back every time with a "Work_unit_missing" or various things of that nature. When I ctrl-c it, I make sure all the processes have exited, I make a backup folder of the files, and when I reboot, nothin'. The files are on a separate hard drive from the system OS, everything's as ideal as I think I can make it.
Any ideas what the nuts is screwing with it? Maybe I'm not getting every process? Maybe I'm not supposed to force quit the extra processes?
Thanks, guys.
For the life of me, no matter what I try lately, I cannot get the SMP client to retain my position when I reboot. It comes back every time with a "Work_unit_missing" or various things of that nature. When I ctrl-c it, I make sure all the processes have exited, I make a backup folder of the files, and when I reboot, nothin'. The files are on a separate hard drive from the system OS, everything's as ideal as I think I can make it.
Any ideas what the nuts is screwing with it? Maybe I'm not getting every process? Maybe I'm not supposed to force quit the extra processes?
Thanks, guys.
0
Comments
Also a couple of faq's
http://fahwiki.net/index.php/How_do_I_install_the_SMP_Console_client_in_Windows_Vista%3F
http://fahwiki.net/index.php/How_do_I_install_the_CPU_Console_client_in_Windows_Vista%3F
Interestingly, in my recent testing, I found that just a cold reboot works just fine. If I don't even touch the console window and just reboot, it shuts down and comes back exactly where it left off. Apparently handling it with kid gloves is worse than just letting it do its thing.
Yessir! This is my fahlog-prev, it clearly shows the problems. Like I said, currently, if I just cold shut it down, no problem, picks it up perfectly. Towards the end, you'll see exactly that happen. The only difference is the missing_wu errors happen after I ctrl-c and force mpiexec to quit, and then reboot. Until a cold reboot fails me, I guess I'll just keep doing that.
(Hopefully this isn't incredibly miserably long, I put the code tags in, but I've never used them before)
log-prev:
--- Opening Log file [October 31 18:29:27] # SMP Client ################################################################## ############################################################################### Folding@Home Client Version 5.91beta5 http://folding.stanford.edu ############################################################################### ############################################################################### Launch directory: F:\Folding Executable: F:\Folding\fah.exe [18:29:27] - Ask before connecting: No [18:29:27] - User name: MJancaitis (Team 93) [18:29:27] - User ID: 350DB72D6EA6D6A7 [18:29:27] - Machine ID: 1 [18:29:27] [18:29:27] Loaded queue successfully. [18:29:27] [18:29:27] + Processing work unit [18:29:27] Core required: FahCore_a1.exe [18:29:27] Core found. [18:29:27] Working on Unit 08 [October 31 18:29:27] [18:29:27] + Working ... [18:29:27] [18:29:27] *------------------------------* [18:29:27] Folding@Home Gromacs SMP Core [18:29:27] Version 1.74 (March 10, 2007) [18:29:27] [18:29:27] Preparing to commence simulation [18:29:27] - Looking at optimizations... [18:29:27] - Created dyn [18:29:27] - Files status OK [18:29:27] Error: Work unit read from disk is invalid [18:29:27] [18:29:27] Folding@h Files status OK [18:29:27] Error: Work ORMAT [18:29:27] Finalizing output [18:29:27] nvalid [18:31:27] [18:31:27] Folding@home Core Shutdown: BAD_FILE_FORMAT [18:31:27] Finalizing output [18:31:31] CoreStatus = 1 (1) [18:31:31] Client-core communications error: ERROR 0x1 [18:31:31] Deleting current work unit & continuing... [18:33:51] - Preparing to get new work unit... [18:33:51] + Attempting to get work packet [18:33:51] - Connecting to assignment server [18:33:52] - Successful: assigned to (171.64.65.64). [18:33:52] + News From Folding@Home: Welcome to Folding@Home [18:33:52] Loaded queue successfully. [18:33:57] + Closed connections [18:34:02] [18:34:02] + Processing work unit [18:34:02] Core required: FahCore_a1.exe [18:34:02] Core found. [18:34:02] Working on Unit 09 [October 31 18:34:02] [18:34:02] + Working ... [18:34:02] [18:34:02] *------------------------------* [18:34:02] Folding@Home Gromacs SMP Core [18:34:02] Version 1.74 (March 10, 2007) [18:34:02] [18:34:02] Preparing to commence simulation [18:34:02] - Ensuring status. Please wait. [18:34:19] - Looking at optimizations... [18:34:19] - Working with standard loops on this execution. [18:34:19] - Created dyn [18:34:19] - Files status OK [18:34:24] - Expanded 2966049 -> 15212109 (decompressed 512.8 percent) [18:34:24] - Starting from initial work packet [18:34:24] [18:34:24] Project: 2653 (Run 11, Clone 114, Gen 2) [18:34:24] [18:34:25] Entering M.D. [18:34:31] Rejecting checkpoint [18:34:32] Protein: Protein in POPC [18:34:32] Writing local files [18:34:33] Extra SSE boost OK. [Completed Unit in one pass] [10:28:14] Past main M.D. loop [10:28:14] Will end MPI now [10:29:14] [10:29:14] Finished Work Unit: [10:29:14] - Reading up to 3724128 from "work/wudata_09.arc": Read 3724128 [10:29:14] - Reading up to 1779148 from "work/wudata_09.xtc": Read 1779148 [10:29:14] goefile size: 0 [10:29:14] logfile size: 19105 [10:29:14] Leaving Run [10:29:14] - Writing 5526781 bytes of core data to disk... [10:29:14] ... Done. [10:29:15] - Failed to delete work/wudata_09.sas [10:29:15] - Failed to delete work/wudata_09.goe [10:29:15] Warning: check for stray files [10:29:15] - Shutting down core [10:29:15] [10:29:15] Folding@home Core Shutdown: FINISHED_UNIT [10:29:15] [10:29:15] Folding@home Core Shutdown: FINISHED_UNIT [10:31:23] CoreStatus = 64 (100) [10:31:23] Sending work to server [10:31:23] + Attempting to send results [10:31:43] + Results successfully sent [10:31:43] Thank you for your contribution to Folding@Home. [10:31:43] + Number of Units Completed: 16 [10:33:47] - Preparing to get new work unit... [10:33:47] + Attempting to get work packet [10:33:47] - Connecting to assignment server [10:33:47] - Successful: assigned to (171.64.65.64). [10:33:47] + News From Folding@Home: Welcome to Folding@Home [10:33:48] Loaded queue successfully. [10:33:53] + Closed connections [10:33:53] [10:33:53] + Processing work unit [10:33:53] Core required: FahCore_a1.exe [10:33:53] Core found. [10:33:53] Working on Unit 00 [November 1 10:33:53] [10:33:53] + Working ... [10:33:53] [10:33:53] *------------------------------* [10:33:53] Folding@Home Gromacs SMP Core [10:33:53] Version 1.74 (March 10, 2007) [10:33:53] [10:33:53] Preparing to commence simulation [10:33:53] - Ensuring status. Please wait. [10:33:55] - Starting from initial work packet [10:33:55] [10:33:55] Project: 2653 (Run 9, Clone 111, Gen 4) [10:33:55] [10:33:55] Assembly optimizations on if available. [10:33:55] Entering M.D. [10:34:13] ial work pa- Starting from initial work packet [10:34:13] [10:34:13] Project: 2653 (Run 9, Clone 111, Gen 4) [10:34:13] [10:34:14] Entering M.D. [10:34:20] Rejecting checkpoint [10:34:21] Protein: Protein in POPC [10:34:21] Writing local files [10:34:22] Extra SSE boost OK. [10:34:22] Writing local files [10:34:22] Completed 0 out of 500000 steps (0 percent) [10:43:30] Writing local files [10:43:30] Completed 5000 out of 500000 steps (1 percent) [10:52:25] Writing local files [10:52:25] Completed 10000 out of 500000 steps (2 percent) [11:01:06] Writing local files [11:01:06] Completed 15000 out of 500000 steps (3 percent) [11:09:47] Writing local files [11:09:47] Completed 20000 out of 500000 steps (4 percent) [11:18:29] Writing local files [11:18:29] Completed 25000 out of 500000 steps (5 percent) ........... [20:13:45] Writing local files [20:13:45] Completed 325000 out of 500000 steps (65 percent) [20:22:30] Writing local files [20:22:30] Completed 330000 out of 500000 steps (66 percent) [20:31:38] Writing local files [20:31:38] Completed 335000 out of 500000 steps (67 percent) [20:41:36] Writing local files [20:41:36] Completed 340000 out of 500000 steps (68 percent) [20:51:35] Writing local files [20:51:35] Completed 345000 out of 500000 steps (69 percent) [21:01:34] Writing local files [21:01:34] Completed 350000 out of 500000 steps (70 percent) [21:11:32] Writing local files [21:11:33] Completed 355000 out of 500000 steps (71 percent) [21:21:32] Writing local files [21:21:32] Completed 360000 out of 500000 steps (72 percent) Folding@Home Client Shutdown at user request. Folding@Home Client Shutdown. --- Opening Log file [November 2 05:53:32] # SMP Client ################################################################## ############################################################################### Folding@Home Client Version 5.91beta5 http://folding.stanford.edu ############################################################################### ############################################################################### Launch directory: F:\Folding Executable: F:\Folding\fah.exe [05:53:32] - Ask before connecting: No [05:53:32] - User name: MJancaitis (Team 93) [05:53:32] - User ID: 350DB72D6EA6D6A7 [05:53:32] - Machine ID: 1 [05:53:32] [05:53:32] Loaded queue successfully. [05:53:32] [05:53:32] + Processing work unit [05:53:32] Core required: FahCore_a1.exe [05:53:32] Core found. [05:53:32] Working on Unit 00 [November 2 05:53:32] [05:53:32] + Working ... [05:53:33] [05:53:33] *------------------------------* [05:53:33] Folding@Home Gromacs SMP Core [05:53:33] Version 1.74 (March 10, 2007) [05:53:33] [05:53:33] Preparing to commence simulation [05:53:33] - Ensuring status. Please wait. [05:53:33] Created dyn [05:53:33] - Files status OK [05:53:33] [05:53:33] Folding@home Core Shutdown: MISSING_WORK_FILES [05:53:33] Finalizing output [05:53:50] ation of core was improper. [05:53:50] - Going to use standard loops. [05:53:50] - Files status OK [05:55:50] [05:55:50] Folding@home Core Shutdown: MISSING_WORK_FILES [05:55:50] Finalizing output [05:55:52] CoreStatus = 1 (1) [05:55:52] Client-core communications error: ERROR 0x1 [05:55:52] Deleting current work unit & continuing... Folding@Home Client Shutdown at user request. Folding@Home Client Shutdown. --- Opening Log file [November 2 08:35:10] # SMP Client ################################################################## ############################################################################### Folding@Home Client Version 5.91beta5 http://folding.stanford.edu ############################################################################### ############################################################################### Launch directory: F:\Folding Executable: F:\Folding\fah.exe [08:35:10] - Ask before connecting: No [08:35:10] - User name: MJancaitis (Team 93) [08:35:10] - User ID: 350DB72D6EA6D6A7 [08:35:10] - Machine ID: 1 [08:35:10] [08:35:10] Loaded queue successfully. [08:35:10] [08:35:10] + Processing work unit [08:35:10] Core required: FahCore_a1.exe [08:35:10] Core found. [08:35:10] Working on Unit 00 [November 2 08:35:10] [08:35:10] + Working ... [08:35:10] [08:35:10] *------------------------------* [08:35:10] Folding@Home Gromacs SMP Core [08:35:10] Version 1.74 (March 10, 2007) [08:35:10] [08:35:10] Preparing to commence simulation [08:35:10] - Ensuring status. Please wait. [08:35:27] - Looking at optimizations... [08:35:27] - Working with standard loops on this execution. [08:35:27] - Previous termination of core was improper. [08:35:27] - Going to use standard loops. [08:35:27] - Files status OK [08:35:27] put [08:37:27] ding@home Core Shutdown: MISSING_WORK_FILES [08:37:27] Finalizing output [08:37:30] CoreStatus = 1 (1) [08:37:30] Client-core communications error: ERROR 0x1 [08:37:30] Deleting current work unit & continuing... [08:39:50] - Preparing to get new work unit... [08:39:50] + Attempting to get work packet [08:39:50] - Connecting to assignment server [08:39:51] - Successful: assigned to (171.64.65.64). [08:39:51] + News From Folding@Home: Welcome to Folding@Home [08:39:51] Loaded queue successfully. [08:39:56] + Closed connections [08:40:01] [08:40:01] + Processing work unit [08:40:01] Core required: FahCore_a1.exe [08:40:01] Core found. [08:40:01] Working on Unit 01 [November 2 08:40:01] [08:40:01] + Working ... [08:40:01] [08:40:01] *------------------------------* [08:40:01] Folding@Home Gromacs SMP Core [08:40:01] Version 1.74 (March 10, 2007) [08:40:01] [08:40:01] Preparing to commence simulation [08:40:01] - Ensuring status. Please wait. [08:40:18] - Looking at optimizations... [08:40:18] - Working with standard loops on this execution. [08:40:18] - Files stdyn [08:40:18] OK [08:40:18] les status OK [08:40:23] - Expanded 2968654 -> 15200001 (decompressed 512.0 percent) [08:40:23] - Starting from initial work packet [08:40:23] [08:40:23] Project: 2653 (Run 9, Clone 111, Gen 4) [08:40:23] [08:40:25] Entering M.D. [08:40:31] kpoint [08:40:33] Protein: Protein in POPCExtra SSE boost OK. [08:40:33] SSE boost OK. [08:40:34] st OK. [Completed unit in one pass] [00:57:34] Writing final coordinates. [00:57:35] Past main M.D. loop [00:57:35] Will end MPI now [00:58:35] [00:58:35] Finished Work Unit: [00:58:35] - Reading up to 3721776 from "work/wudata_01.arc": Read 3721776 [00:58:35] - Reading up to 1779204 from "work/wudata_01.xtc": Read 1779204 [00:58:35] goefile size: 0 [00:58:35] logfile size: 26437 [00:58:35] Leaving Run [00:58:36] - Writing 5531817 bytes of core data to disk... [00:58:36] ... Done. [00:58:36] - Failed to delete work/wudata_01.sas [00:58:36] - Failed to delete work/wudata_01.goe [00:58:36] Warning: check for stray files [00:58:36] - Shutting down core [00:58:36] [00:58:36] Folding@home Core Shutdown: FINISHED_UNIT [00:58:36] [00:58:36] Folding@home Core Shutdown: FINISHED_UNIT [01:00:41] CoreStatus = 64 (100) [01:00:41] Sending work to server [01:00:41] + Attempting to send results [01:01:02] + Results successfully sent [01:01:02] Thank you for your contribution to Folding@Home. [01:01:02] + Number of Units Completed: 17 [01:03:06] - Preparing to get new work unit... [01:03:06] + Attempting to get work packet [01:03:06] - Connecting to assignment server [01:03:07] - Successful: assigned to (171.64.65.64). [01:03:07] + News From Folding@Home: Welcome to Folding@Home [01:03:07] Loaded queue successfully. [01:03:12] + Closed connections [01:03:12] [01:03:12] + Processing work unit [01:03:12] Core required: FahCore_a1.exe [01:03:12] Core found. [01:03:12] Working on Unit 02 [November 3 01:03:12] [01:03:12] + Working ... [01:03:12] [01:03:12] *------------------------------* [01:03:12] Folding@Home Gromacs SMP Core [01:03:12] Version 1.74 (March 10, 2007) [01:03:12] [01:03:12] Preparing to commence simulation [01:03:12] - Ensuring status. Please wait. [01:03:14] - Starting from initial work packet [01:03:14] [01:03:14] Project: 2653 (Run 13, Clone 85, Gen 9) [01:03:14] [01:03:14] Assembly optimizations on if available. [01:03:14] Entering M.D. [01:03:32] ial work packet [01:03:33] rting from initial work packet [01:03:33] [01:03:33] Project: 2Entering M.D. [01:03:33] lone 85, Gen 9) [01:03:33] [01:03:33] Entering M.D. [01:03:39] Rejecting checkpoint [01:03:40] Protein: Protein in POPC [01:03:40] Writing local files [01:03:41] Extra SSE boost OK. [01:03:41] Writing local files [01:03:41] Completed 0 out of 500000 steps (0 percent) [01:12:30] Writing local files [01:12:30] Completed 5000 out of 500000 steps (1 percent) [01:21:17] Writing local files [01:21:17] Completed 10000 out of 500000 steps (2 percent) [01:30:05] Writing local files [01:30:05] Completed 15000 out of 500000 steps (3 percent) [01:38:53] Writing local files [01:38:53] Completed 20000 out of 500000 steps (4 percent) [01:47:54] Writing local files [01:47:54] Completed 25000 out of 500000 steps (5 percent) [01:56:41] Writing local files [01:56:42] Completed 30000 out of 500000 steps (6 percent) [02:05:29] Writing local files [02:05:29] Completed 35000 out of 500000 steps (7 percent) [02:14:18] Writing local files [02:14:18] Completed 40000 out of 500000 steps (8 percent) [02:23:08] Writing local files [02:23:08] Completed 45000 out of 500000 steps (9 percent) [02:31:57] Writing local files [02:31:57] Completed 50000 out of 500000 steps (10 percent) [02:40:46] Writing local files [02:40:46] Completed 55000 out of 500000 steps (11 percent) [02:49:51] Writing local files [02:49:51] Completed 60000 out of 500000 steps (12 percent) [02:58:40] Writing local files [02:58:40] Completed 65000 out of 500000 steps (13 percent) [03:07:31] Writing local files [03:07:31] Completed 70000 out of 500000 steps (14 percent) Folding@Home Client Shutdown. --- Opening Log file [November 3 03:10:08] # SMP Client ################################################################## ############################################################################### Folding@Home Client Version 5.91beta5 http://folding.stanford.edu ############################################################################### ############################################################################### Launch directory: F:\Folding Executable: F:\Folding\fah.exe [03:10:08] - Ask before connecting: No [03:10:08] - User name: MJancaitis (Team 93) [03:10:08] - User ID: 350DB72D6EA6D6A7 [03:10:08] - Machine ID: 1 [03:10:08] [03:10:08] Loaded queue successfully. [03:10:08] [03:10:08] + Processing work unit [03:10:08] Core required: FahCore_a1.exe [03:10:08] Core found. [03:10:08] Working on Unit 02 [November 3 03:10:08] [03:10:08] + Working ... [03:10:09] [03:10:09] *------------------------------* [03:10:09] Folding@Home Gromacs SMP Core [03:10:09] Version 1.74 (March 10, 2007) [03:10:09] [03:10:09] Preparing to commence simulation [03:10:09] - Ensuring status. Please wait. [03:10:26] - Looking at optimizations... [03:10:26] - Working with standard loops on this execution. [03:10:26] - Previous termination of core was improper. [03:10:26] - Going to use standard loops. [03:10:26] - Files status OK [03:10:30] - Expanded 2961919 -> 15199495 (decompressed 513.1 percent) [03:10:30] [03:10:30] Project: 2653 (Run 13, Clone 85, Gen 9) [03:10:30] [03:10:31] Entering M.D. [03:10:37] Calling FAH init [03:10:38] in POPC [03:10:38] Writing local files [03:10:38] checkpoint) [03:10:38] Read checkpoint [03:10:38] Protein: Protein in POPC [03:10:38] a SSE boost OK. [03:10:38] les [03:10:38] Completed 70000 out of 500000 steps (14 percent) [03:10:39] Extra SSE boost OK. Folding@Home Client Shutdown. --- Opening Log file [November 3 03:19:49] # SMP Client ################################################################## ############################################################################### Folding@Home Client Version 5.91beta5 http://folding.stanford.edu ############################################################################### ############################################################################### Launch directory: F:\Folding Executable: F:\Folding\fah.exe [03:19:49] - Ask before connecting: No [03:19:49] - User name: MJancaitis (Team 93) [03:19:49] - User ID: 350DB72D6EA6D6A7 [03:19:49] - Machine ID: 1 [03:19:49] [03:19:49] Loaded queue successfully. [03:19:49] [03:19:49] + Processing work unit [03:19:49] Core required: FahCore_a1.exe [03:19:49] Core found. [03:19:49] Working on Unit 02 [November 3 03:19:49] [03:19:49] + Working ... [03:19:49] [03:19:49] *------------------------------* [03:19:49] Folding@Home Gromacs SMP Core [03:19:49] Version 1.74 (March 10, 2007) [03:19:49] [03:19:49] Preparing to commence simulation [03:19:49] - Ensuring status. Please wait. [03:20:06] - Looking at optimizations... [03:20:06] - Working with standard loops on this execution. [03:20:06] Examination of work files indicates 8 consecutive improper terminations of core. [03:20:06] es status OK [03:20:11] - Expanded 2961919 -> 15199495 (decompressed 513.1 percent) [03:20:11] 3 (Run 13, Clone 85, Gen 9) [03:20:11] [03:20:11] 85, Gen 9) [03:20:11] [03:20:12] Entering M.D. [03:20:18] Calling FAH init [03:20:19] Read topology [03:20:20] [03:20:20] Completed 70000 out of 500000 steps (14 percent) [03:20:20] Extra SSE booExtra SSE boost OK. [03:20:20] les [03:20:20] Completed 70000 out of 500000 steps (14 percent) [03:20:21] Extra SSE boost OK. [03:34:06] Writing local files [03:34:06] Completed 75000 out of 500000 steps (15 percent) [03:47:29] Writing local files [03:47:29] Completed 80000 out of 500000 steps (16 percent) [04:05:55] Writing local files [04:05:55] Completed 85000 out of 500000 steps (17 percent) [04:17:23] Writing local files [04:17:23] Completed 90000 out of 500000 steps (18 percent) Folding@Home Client Shutdown.And from there on it pretty much just picks up correctly because I haven't forced a quit since. That's as far back as the log appears to go, sadly. Hope it helps if you have an idea.