SMP teething problems and other shenanigans
Winga
MrSouth Africa Icrontian
After a few initial teething problems I had my first SMP work unit crunching away on dual core goodness. I still haven't got around to applying the hack to run it as a service but Im kinda getting used to it sitting on my task bar and anyway it's easy to acess so I'll leave that alone for now.
The main reason for posting this thread though is to highlight the inconsistencies between the frames. I tried the Ctrl + C shutdown method and as has already been pointed out in other threads, that helps squat.
I haven't used the PC for anything strenous while the first WU was crunching as I wanted to make sure it finished properly, but each time I had to shut down the PC for whatever reason it resumed with different times between frames.
This log illustrates what I'm trying to say:
Folding@Home Gromacs SMP Core
[06:23:54] Version 1.74 (March 10, 2007)
[06:23:54]
[06:23:54] Preparing to commence simulation
[06:23:54] - Ensuring status. Please wait.
[06:24:11] - Looking at optimizations...
[06:24:11] - Working with standard loops on this execution.
[06:24:11] - Previous termination of core was improper.
[06:24:11] - Going to use standard loops.
[06:24:11] - Files status OK
[06:24:19] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[06:24:19]
[06:24:19] Project: 2651 (Run 0, Clone 292, Gen 49)
[06:24:19]
[06:24:20] Entering M.D.
[06:24:27] Calling FAH init
[06:24:29] in POPC
[06:24:29] Writing local files
[06:24:29] checkpoint)
[06:24:29] Read checkpoint
[06:24:29] Protein: Protein in POPC
[06:24:29] Writing local files
[06:24:31] Extra SSE boost OK.
[06:24:31] Writing local files
[06:24:31] Completed 0 out of 500000 steps (0 percent)
[06:57:04] Writing local files
[06:57:04] Completed 5000 out of 500000 steps (1 percent)
[07:29:39] Writing local files
[07:29:39] Completed 10000 out of 500000 steps (2 percent)
[08:01:46] Writing local files
[08:01:46] Completed 15000 out of 500000 steps (3 percent)
[08:33:53] Writing local files
[08:33:53] Completed 20000 out of 500000 steps (4 percent)
[09:06:01] Writing local files
[09:06:01] Completed 25000 out of 500000 steps (5 percent)
[09:39:05] Writing local files
[09:39:06] Completed 30000 out of 500000 steps (6 percent
I am getting around 33 minutes between these frames.
[21:52:03] Folding@Home Gromacs SMP Core
[21:52:03] Version 1.74 (March 10, 2007)
[21:52:03]
[21:52:03] Preparing to commence simulation
[21:52:03] - Ensuring status. Please wait.
[21:52:20] - Looking at optimizations...
[21:52:20] - Working with standard loops on this execution.
[21:52:20] Examination of work files indicates 8 consecutive improper terminations of core.
[21:52:28] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[21:52:29]
[21:52:29] Project: 2651 (Run 0, Clone 292, Gen 49)
[21:52:29]
[21:52:30] Entering M.D.
[21:52:36] Calling FAH init
[21:52:38] in POPC
[21:52:38] Writing local files
[21:52:38] checkpoint)
[21:52:38] Read checkpoint
[21:52:38] 0 steps (25 percent)
[21:52:38] PC
[21:52:38] Writing local files
[21:52:38] Completed 127280 out of 500000 steps (25 percent)
[21:52:40] Extra SSE boost OK.
[22:10:18] Writing local files
[22:10:18] Completed 130000 out of 500000 steps (26 percent)
[22:42:30] Writing local files
[22:42:31] Completed 135000 out of 500000 steps (27 percent)
[23:14:32] Writing local files
[23:14:32] Completed 140000 out of 500000 steps (28 percent)
[23:46:32] Writing local files
[23:46:32] Completed 145000 out of 500000 steps (29 percent)
[00:21:12] Writing local files
[00:21:12] Completed 150000 out of 500000 steps (30 percent)
[00:53:11] Writing local files
[00:53:11] Completed 155000 out of 500000 steps (31 percent)
32 minutes on this one which is still within the ballpark.
[10:50:11] Folding@Home Gromacs SMP Core
[10:50:11] Version 1.74 (March 10, 2007)
[10:50:11]
[10:50:11] Preparing to commence simulation
[10:50:11] - Ensuring status. Please wait.
[10:50:28] - Looking at optimizations...
[10:50:28] - Working with standard loops on this execution.
[10:50:28] Examination of work files indicates 8 consecutive improper terminations of core.
[10:50:32] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[10:50:33]
[10:50:33] Project: 2651 (Run 0, Clone 292, Gen 49)
[10:50:33]
[10:50:35] Entering M.D.
[10:50:44] Calling FAH init
[10:50:46] Read topology
[10:50:46] g local files
[10:50:46] checkpoint)
[10:50:46] Read checkpoint
[10:50:46] Protein: Protein in POPC
[10:50:46] Writing local files
[10:50:47] Completed 230727 out of 500000 steps (46 percent)
[10:50:49] Extra SSE boost OK.
[11:33:39] Writing local files
[11:33:39] Completed 235000 out of 500000 steps (47 percent)
[12:23:02] Writing local files
[12:23:02] Completed 240000 out of 500000 steps (48 percent)
[13:12:28] Writing local files
[13:12:28] Completed 245000 out of 500000 steps (49 percent)
[14:01:52] Writing local files
[14:01:52] Completed 250000 out of 500000 steps (50 percent)
[14:51:12] Writing local files
[14:51:12] Completed 255000 out of 500000 steps (51 percent)
Then it suddenly jumps to 50 minutes per frame for no apparant reason
[05:02:55] Folding@Home Gromacs SMP Core
[05:02:55] Version 1.74 (March 10, 2007)
[05:02:55]
[05:02:55] Preparing to commence simulation
[05:02:55] - Ensuring status. Please wait.
[05:03:12] - Looking at optimizations...
[05:03:12] - Working with standard loops on this execution.
[05:03:12] Examination of work files indicates 8 consecutive improper terminations of core.
[05:03:20] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[05:03:21]
[05:03:21] Project: 2651 (Run 0, Clone 292, Gen 49)
[05:03:21]
[05:03:23] Entering M.D.
[05:03:29] Calling FAH init
[05:03:31] Read topology
[05:03:31] (Starting from checkpoint)
[05:03:31] 935 out of 500000 steps (96 percent)
[05:03:31] PC
[05:03:31] Writing local files
[05:03:31] Completed 480935 out of 500000 steps (96 percent)
[05:03:33] Extra SSE boost OK.
[05:30:44] Writing local files
[05:30:44] Completed 485000 out of 500000 steps (97 percent)
[06:04:06] Writing local files
[06:04:06] Completed 490000 out of 500000 steps (98 percent)
[06:36:10] Writing local files
[06:36:10] Completed 495000 out of 500000 steps (99 percent)
[07:08:07] Writing local files
[07:08:07] Completed 500000 out of 500000 steps (100 percent)
Only picked it up towards the end of the WU but after another restart, it went all the way down to under 27 minutes.
Is this another bug or do other fah versions do the same thing? I must say, first time I've noticed it.
The main reason for posting this thread though is to highlight the inconsistencies between the frames. I tried the Ctrl + C shutdown method and as has already been pointed out in other threads, that helps squat.
I haven't used the PC for anything strenous while the first WU was crunching as I wanted to make sure it finished properly, but each time I had to shut down the PC for whatever reason it resumed with different times between frames.
This log illustrates what I'm trying to say:
Folding@Home Gromacs SMP Core
[06:23:54] Version 1.74 (March 10, 2007)
[06:23:54]
[06:23:54] Preparing to commence simulation
[06:23:54] - Ensuring status. Please wait.
[06:24:11] - Looking at optimizations...
[06:24:11] - Working with standard loops on this execution.
[06:24:11] - Previous termination of core was improper.
[06:24:11] - Going to use standard loops.
[06:24:11] - Files status OK
[06:24:19] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[06:24:19]
[06:24:19] Project: 2651 (Run 0, Clone 292, Gen 49)
[06:24:19]
[06:24:20] Entering M.D.
[06:24:27] Calling FAH init
[06:24:29] in POPC
[06:24:29] Writing local files
[06:24:29] checkpoint)
[06:24:29] Read checkpoint
[06:24:29] Protein: Protein in POPC
[06:24:29] Writing local files
[06:24:31] Extra SSE boost OK.
[06:24:31] Writing local files
[06:24:31] Completed 0 out of 500000 steps (0 percent)
[06:57:04] Writing local files
[06:57:04] Completed 5000 out of 500000 steps (1 percent)
[07:29:39] Writing local files
[07:29:39] Completed 10000 out of 500000 steps (2 percent)
[08:01:46] Writing local files
[08:01:46] Completed 15000 out of 500000 steps (3 percent)
[08:33:53] Writing local files
[08:33:53] Completed 20000 out of 500000 steps (4 percent)
[09:06:01] Writing local files
[09:06:01] Completed 25000 out of 500000 steps (5 percent)
[09:39:05] Writing local files
[09:39:06] Completed 30000 out of 500000 steps (6 percent
I am getting around 33 minutes between these frames.
[21:52:03] Folding@Home Gromacs SMP Core
[21:52:03] Version 1.74 (March 10, 2007)
[21:52:03]
[21:52:03] Preparing to commence simulation
[21:52:03] - Ensuring status. Please wait.
[21:52:20] - Looking at optimizations...
[21:52:20] - Working with standard loops on this execution.
[21:52:20] Examination of work files indicates 8 consecutive improper terminations of core.
[21:52:28] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[21:52:29]
[21:52:29] Project: 2651 (Run 0, Clone 292, Gen 49)
[21:52:29]
[21:52:30] Entering M.D.
[21:52:36] Calling FAH init
[21:52:38] in POPC
[21:52:38] Writing local files
[21:52:38] checkpoint)
[21:52:38] Read checkpoint
[21:52:38] 0 steps (25 percent)
[21:52:38] PC
[21:52:38] Writing local files
[21:52:38] Completed 127280 out of 500000 steps (25 percent)
[21:52:40] Extra SSE boost OK.
[22:10:18] Writing local files
[22:10:18] Completed 130000 out of 500000 steps (26 percent)
[22:42:30] Writing local files
[22:42:31] Completed 135000 out of 500000 steps (27 percent)
[23:14:32] Writing local files
[23:14:32] Completed 140000 out of 500000 steps (28 percent)
[23:46:32] Writing local files
[23:46:32] Completed 145000 out of 500000 steps (29 percent)
[00:21:12] Writing local files
[00:21:12] Completed 150000 out of 500000 steps (30 percent)
[00:53:11] Writing local files
[00:53:11] Completed 155000 out of 500000 steps (31 percent)
32 minutes on this one which is still within the ballpark.
[10:50:11] Folding@Home Gromacs SMP Core
[10:50:11] Version 1.74 (March 10, 2007)
[10:50:11]
[10:50:11] Preparing to commence simulation
[10:50:11] - Ensuring status. Please wait.
[10:50:28] - Looking at optimizations...
[10:50:28] - Working with standard loops on this execution.
[10:50:28] Examination of work files indicates 8 consecutive improper terminations of core.
[10:50:32] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[10:50:33]
[10:50:33] Project: 2651 (Run 0, Clone 292, Gen 49)
[10:50:33]
[10:50:35] Entering M.D.
[10:50:44] Calling FAH init
[10:50:46] Read topology
[10:50:46] g local files
[10:50:46] checkpoint)
[10:50:46] Read checkpoint
[10:50:46] Protein: Protein in POPC
[10:50:46] Writing local files
[10:50:47] Completed 230727 out of 500000 steps (46 percent)
[10:50:49] Extra SSE boost OK.
[11:33:39] Writing local files
[11:33:39] Completed 235000 out of 500000 steps (47 percent)
[12:23:02] Writing local files
[12:23:02] Completed 240000 out of 500000 steps (48 percent)
[13:12:28] Writing local files
[13:12:28] Completed 245000 out of 500000 steps (49 percent)
[14:01:52] Writing local files
[14:01:52] Completed 250000 out of 500000 steps (50 percent)
[14:51:12] Writing local files
[14:51:12] Completed 255000 out of 500000 steps (51 percent)
Then it suddenly jumps to 50 minutes per frame for no apparant reason
[05:02:55] Folding@Home Gromacs SMP Core
[05:02:55] Version 1.74 (March 10, 2007)
[05:02:55]
[05:02:55] Preparing to commence simulation
[05:02:55] - Ensuring status. Please wait.
[05:03:12] - Looking at optimizations...
[05:03:12] - Working with standard loops on this execution.
[05:03:12] Examination of work files indicates 8 consecutive improper terminations of core.
[05:03:20] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[05:03:21]
[05:03:21] Project: 2651 (Run 0, Clone 292, Gen 49)
[05:03:21]
[05:03:23] Entering M.D.
[05:03:29] Calling FAH init
[05:03:31] Read topology
[05:03:31] (Starting from checkpoint)
[05:03:31] 935 out of 500000 steps (96 percent)
[05:03:31] PC
[05:03:31] Writing local files
[05:03:31] Completed 480935 out of 500000 steps (96 percent)
[05:03:33] Extra SSE boost OK.
[05:30:44] Writing local files
[05:30:44] Completed 485000 out of 500000 steps (97 percent)
[06:04:06] Writing local files
[06:04:06] Completed 490000 out of 500000 steps (98 percent)
[06:36:10] Writing local files
[06:36:10] Completed 495000 out of 500000 steps (99 percent)
[07:08:07] Writing local files
[07:08:07] Completed 500000 out of 500000 steps (100 percent)
Only picked it up towards the end of the WU but after another restart, it went all the way down to under 27 minutes.
Is this another bug or do other fah versions do the same thing? I must say, first time I've noticed it.
0
Comments
Should look like this:
"C:\Program Files\Folding@Home Windows SMP Client V1.01\fah.exe" -forceasm
Regarding the inconsistencies, not sure what is happening there but keep an eye on task manager. When the frame times take longer, look at what other processes are hogging cpu. I have experienced problems with the folding service, mpiexec.exe eating upwards of 25% of CPU power. Let us know if this is the case.
Winga, if you've done the best for a consistent quality power and internet connection/network setup, don't get too bent out of shape with strange and apparent defective work units. Even though WinSMP has been out for months, it is still clearly a beta program, both in experience and officially.
Also, please check out this thread at Folding Community for known bugs.
The flag is not needed in the SMP client. It is hardcoded into the exe. It will make no difference. The "Working with standard loops " message is a known bug. If you see "Extra SSE boost OK " in the log then all is fine.
Using Ctrl+C is the proper way to shut down the app. but sometimes it does not kill all the background services it runs. Afte stopping the client ( If you do not intend to restart the machine ) then check the task manager to be sure it shut down the associated services. Or Better still when restarting the client check the task manager to see if only 1 instance of mpiexec.exe and smpd.exe are running if you have more than one of either of those that is probably your problem.
You should read this thread at the folding forum. http://forum.folding-community.org/ftopic18210.html it covers a lot of this and more. Let us know how you make out.
Fold on
Scott
edit : Leo bet me too it
I never nailed down what would cause this exactly but when it would happen but the commonality was always with my internet connection. At first, I had no idea that if you unplugged from the net, it would affect the client while it was running. I had been having some connection issues and while troubleshooting and working to solve my problems, the mpiexec.exe would start grabbing CPU cycles. I lost a few WU's in the process. It hasn't happened in a while. I think Stanford must have made an adjustment to the client since because when my net went down recently, the only thing that would happen is the cores would stop working. The client itself stills displays that it's working but of course it will never complete another step until it's shutdown and restarted.
To fix the mpiexec.exe issue, I would uninstall the folding client, reinstall and run install.bat again. Problem solved.
Most of SMP´s WU benefits by L2 cache, so usually AMD´s procs are slower than Intel´s, and the p2651 is one of them. Still you are within the deadline to return them.
I´ve found that sometimes (and dunno why) the TPF goes bezerk, taking much longer than its used to. I just Control-C it and start it over.
And if you, like me, get bothered by the little CMD window where Fah is running, I suggest you try this little proggy that sends it to your systray:
Tray It!