SMP teething problems and other shenanigans

WingaWinga MrSouth Africa Icrontian
edited September 2007 in Folding@Home
After a few initial teething problems I had my first SMP work unit crunching away on dual core goodness. I still haven't got around to applying the hack to run it as a service but Im kinda getting used to it sitting on my task bar and anyway it's easy to acess so I'll leave that alone for now.

The main reason for posting this thread though is to highlight the inconsistencies between the frames. I tried the Ctrl + C shutdown method and as has already been pointed out in other threads, that helps squat.

I haven't used the PC for anything strenous while the first WU was crunching as I wanted to make sure it finished properly, but each time I had to shut down the PC for whatever reason it resumed with different times between frames.

This log illustrates what I'm trying to say:

Folding@Home Gromacs SMP Core
[06:23:54] Version 1.74 (March 10, 2007)
[06:23:54]
[06:23:54] Preparing to commence simulation
[06:23:54] - Ensuring status. Please wait.
[06:24:11] - Looking at optimizations...
[06:24:11] - Working with standard loops on this execution.
[06:24:11] - Previous termination of core was improper.
[06:24:11] - Going to use standard loops.
[06:24:11] - Files status OK
[06:24:19] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[06:24:19]
[06:24:19] Project: 2651 (Run 0, Clone 292, Gen 49)
[06:24:19]
[06:24:20] Entering M.D.
[06:24:27] Calling FAH init
[06:24:29] in POPC
[06:24:29] Writing local files
[06:24:29] checkpoint)
[06:24:29] Read checkpoint
[06:24:29] Protein: Protein in POPC
[06:24:29] Writing local files
[06:24:31] Extra SSE boost OK.
[06:24:31] Writing local files
[06:24:31] Completed 0 out of 500000 steps (0 percent)
[06:57:04] Writing local files
[06:57:04] Completed 5000 out of 500000 steps (1 percent)
[07:29:39] Writing local files
[07:29:39] Completed 10000 out of 500000 steps (2 percent)
[08:01:46] Writing local files
[08:01:46] Completed 15000 out of 500000 steps (3 percent)
[08:33:53] Writing local files
[08:33:53] Completed 20000 out of 500000 steps (4 percent)
[09:06:01] Writing local files
[09:06:01] Completed 25000 out of 500000 steps (5 percent)
[09:39:05] Writing local files
[09:39:06] Completed 30000 out of 500000 steps (6 percent

I am getting around 33 minutes between these frames.

[21:52:03] Folding@Home Gromacs SMP Core
[21:52:03] Version 1.74 (March 10, 2007)
[21:52:03]
[21:52:03] Preparing to commence simulation
[21:52:03] - Ensuring status. Please wait.
[21:52:20] - Looking at optimizations...
[21:52:20] - Working with standard loops on this execution.
[21:52:20] Examination of work files indicates 8 consecutive improper terminations of core.
[21:52:28] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[21:52:29]
[21:52:29] Project: 2651 (Run 0, Clone 292, Gen 49)
[21:52:29]
[21:52:30] Entering M.D.
[21:52:36] Calling FAH init
[21:52:38] in POPC
[21:52:38] Writing local files
[21:52:38] checkpoint)
[21:52:38] Read checkpoint
[21:52:38] 0 steps (25 percent)
[21:52:38] PC
[21:52:38] Writing local files
[21:52:38] Completed 127280 out of 500000 steps (25 percent)
[21:52:40] Extra SSE boost OK.
[22:10:18] Writing local files
[22:10:18] Completed 130000 out of 500000 steps (26 percent)
[22:42:30] Writing local files
[22:42:31] Completed 135000 out of 500000 steps (27 percent)
[23:14:32] Writing local files
[23:14:32] Completed 140000 out of 500000 steps (28 percent)
[23:46:32] Writing local files
[23:46:32] Completed 145000 out of 500000 steps (29 percent)
[00:21:12] Writing local files
[00:21:12] Completed 150000 out of 500000 steps (30 percent)
[00:53:11] Writing local files
[00:53:11] Completed 155000 out of 500000 steps (31 percent)

32 minutes on this one which is still within the ballpark.

[10:50:11] Folding@Home Gromacs SMP Core
[10:50:11] Version 1.74 (March 10, 2007)
[10:50:11]
[10:50:11] Preparing to commence simulation
[10:50:11] - Ensuring status. Please wait.
[10:50:28] - Looking at optimizations...
[10:50:28] - Working with standard loops on this execution.
[10:50:28] Examination of work files indicates 8 consecutive improper terminations of core.
[10:50:32] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[10:50:33]
[10:50:33] Project: 2651 (Run 0, Clone 292, Gen 49)
[10:50:33]
[10:50:35] Entering M.D.
[10:50:44] Calling FAH init
[10:50:46] Read topology
[10:50:46] g local files
[10:50:46] checkpoint)
[10:50:46] Read checkpoint
[10:50:46] Protein: Protein in POPC
[10:50:46] Writing local files
[10:50:47] Completed 230727 out of 500000 steps (46 percent)
[10:50:49] Extra SSE boost OK.
[11:33:39] Writing local files
[11:33:39] Completed 235000 out of 500000 steps (47 percent)
[12:23:02] Writing local files
[12:23:02] Completed 240000 out of 500000 steps (48 percent)
[13:12:28] Writing local files
[13:12:28] Completed 245000 out of 500000 steps (49 percent)
[14:01:52] Writing local files
[14:01:52] Completed 250000 out of 500000 steps (50 percent)
[14:51:12] Writing local files
[14:51:12] Completed 255000 out of 500000 steps (51 percent)

Then it suddenly jumps to 50 minutes per frame for no apparant reason

[05:02:55] Folding@Home Gromacs SMP Core
[05:02:55] Version 1.74 (March 10, 2007)
[05:02:55]
[05:02:55] Preparing to commence simulation
[05:02:55] - Ensuring status. Please wait.
[05:03:12] - Looking at optimizations...
[05:03:12] - Working with standard loops on this execution.
[05:03:12] Examination of work files indicates 8 consecutive improper terminations of core.
[05:03:20] - Expanded 2430869 -> 12854153 (decompressed 528.7 percent)
[05:03:21]
[05:03:21] Project: 2651 (Run 0, Clone 292, Gen 49)
[05:03:21]
[05:03:23] Entering M.D.
[05:03:29] Calling FAH init
[05:03:31] Read topology
[05:03:31] (Starting from checkpoint)
[05:03:31] 935 out of 500000 steps (96 percent)
[05:03:31] PC
[05:03:31] Writing local files
[05:03:31] Completed 480935 out of 500000 steps (96 percent)
[05:03:33] Extra SSE boost OK.
[05:30:44] Writing local files
[05:30:44] Completed 485000 out of 500000 steps (97 percent)
[06:04:06] Writing local files
[06:04:06] Completed 490000 out of 500000 steps (98 percent)
[06:36:10] Writing local files
[06:36:10] Completed 495000 out of 500000 steps (99 percent)
[07:08:07] Writing local files
[07:08:07] Completed 500000 out of 500000 steps (100 percent)

Only picked it up towards the end of the WU but after another restart, it went all the way down to under 27 minutes.

Is this another bug or do other fah versions do the same thing? I must say, first time I've noticed it.

Comments

  • SPIKE09SPIKE09 Scatland
    edited September 2007
    Stick the -forceasm flag on it that is one reason for the standard loops message.
  • WingaWinga Mr South Africa Icrontian
    edited September 2007
    how does one go about doing that?
  • TBonZTBonZ Ottawa, ON Icrontian
    edited September 2007
    Create a shortcut to the executable and add "-forceasm" after command path.

    Should look like this:

    "C:\Program Files\Folding@Home Windows SMP Client V1.01\fah.exe" -forceasm

    Regarding the inconsistencies, not sure what is happening there but keep an eye on task manager. When the frame times take longer, look at what other processes are hogging cpu. I have experienced problems with the folding service, mpiexec.exe eating upwards of 25% of CPU power. Let us know if this is the case.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited September 2007
    mpiexec.exe eating upwards of 25% of CPU power
    That's really strange. I've never seen that - four SMP clients/computers over several months.

    Winga, if you've done the best for a consistent quality power and internet connection/network setup, don't get too bent out of shape with strange and apparent defective work units. Even though WinSMP has been out for months, it is still clearly a beta program, both in experience and officially.

    Also, please check out this thread at Folding Community for known bugs.
  • scottscott Medina, Ohio Icrontian
    edited September 2007
    SPIKE09 wrote:
    Stick the -forceasm flag on it that is one reason for the standard loops message.

    The flag is not needed in the SMP client. It is hardcoded into the exe. It will make no difference. The "Working with standard loops " message is a known bug. If you see "Extra SSE boost OK " in the log then all is fine.


    Using Ctrl+C is the proper way to shut down the app. but sometimes it does not kill all the background services it runs. Afte stopping the client ( If you do not intend to restart the machine ) then check the task manager to be sure it shut down the associated services. Or Better still when restarting the client check the task manager to see if only 1 instance of mpiexec.exe and smpd.exe are running if you have more than one of either of those that is probably your problem.

    You should read this thread at the folding forum. http://forum.folding-community.org/ftopic18210.html it covers a lot of this and more. Let us know how you make out.

    Fold on

    Scott


    edit : Leo bet me too it
  • SPIKE09SPIKE09 Scatland
    edited September 2007
    -forceasm just lets me sleep better I know the official line on it but, peace of mind from Beta beta days the flag stays on my boxen.
  • TBonZTBonZ Ottawa, ON Icrontian
    edited September 2007
    Leonardo wrote:
    That's really strange. I've never seen that - four SMP clients/computers over several months.

    I never nailed down what would cause this exactly but when it would happen but the commonality was always with my internet connection. At first, I had no idea that if you unplugged from the net, it would affect the client while it was running. I had been having some connection issues and while troubleshooting and working to solve my problems, the mpiexec.exe would start grabbing CPU cycles. I lost a few WU's in the process. It hasn't happened in a while. I think Stanford must have made an adjustment to the client since because when my net went down recently, the only thing that would happen is the cores would stop working. The client itself stills displays that it's working but of course it will never complete another step until it's shutdown and restarted.

    To fix the mpiexec.exe issue, I would uninstall the folding client, reinstall and run install.bat again. Problem solved.
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited September 2007
    What hardware are you using for this SMP client? According to your TPFs, I would say its a X2... is it?

    Most of SMP´s WU benefits by L2 cache, so usually AMD´s procs are slower than Intel´s, and the p2651 is one of them. Still you are within the deadline to return them.

    I´ve found that sometimes (and dunno why) the TPF goes bezerk, taking much longer than its used to. I just Control-C it and start it over.

    And if you, like me, get bothered by the little CMD window where Fah is running, I suggest you try this little proggy that sends it to your systray:

    Tray It!
  • WingaWinga Mr South Africa Icrontian
    edited September 2007
    What hardware are you using for this SMP client?
    Im running an Opteron 165. It's finishing the WU well within the time despite the erratic frame times, so I'm quite happy. Thanks for the TrayIt link :thumbsup:
  • QeldromaQeldroma Arid ZoneAh Member
    edited September 2007
    Another thing you might check is if you have any scheduled tasks going on- like an anti-virus or spyware scan.
Sign In or Register to comment.