Dual Windows SMP clients no dice

Ultra-NexusUltra-Nexus Buenos Aires, ARG
edited February 2008 in Folding@Home
Hi!

Problem occurs when I try to shutdown the clients.

I am running the affinity optimizer (the russian soft) but every time I stop any of these I get:

[17:27:50] Writing local files
[17:27:50] Completed 275000 out of 500000 steps (55 percent)
[17:42:53] Timered checkpoint triggered.
[17:44:27] Writing local files
[17:44:29] Completed 280000 out of 500000 steps (56 percent)
[17:59:30] Timered checkpoint triggered.
[18:11:03] Killing all core threads
[18:11:03] Killing SMP core threads
[18:11:03] Could not get process id information. Please kill core process manually

Folding@Home Client Shutdown at user request.
[18:11:03] ***** Got a SIGTERM signal (2)
[18:11:03] Killing all core threads
[18:11:03] Killing SMP core threads
[18:11:03] Could not get process id information. Please kill core process manually

Folding@Home Client Shutdown.

But I still see the FahCore_a1.exe processes still taking up CPU resources. Even after waiting half an hour... so I kill them manually as it says in the logs.

Now when I engage them again, this is the result:

--- Opening Log file [February 18 21:03:11]


# SMP Client ##################################################################
###############################################################################

Folding@Home Client Version 5.91beta6

http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: U:\SMP1
Executable: U:\SMP1\fah.exe
Arguments: -local -forceasm -verbosity 9

Warning:
By using the -forceasm flag, you are overriding
safeguards in the program. If you did not intend to
do this, please restart the program without -forceasm.
If work units are not completing fully (and particularly
if your machine is overclocked), then please discontinue
use of the flag.

[21:03:11] - Ask before connecting: No
[21:03:11] - User name: _-_ThaNexus_-_ (Team 93)
[21:03:11] - User ID: 598F5D623175336
[21:03:11] - Machine ID: 1
[21:03:11]
[21:03:12] Loaded queue successfully.
[21:03:12]
[21:03:12] - Autosending finished units...
[21:03:12] + Processing work unit
[21:03:12] Trying to send all finished work units
[21:03:12] Core required: FahCore_a1.exe
[21:03:12] + No unsent completed units remaining.
[21:03:12] - Autosend completed
[21:03:12] Core found.
[21:03:12] Working on Unit 05 [February 18 21:03:12]
[21:03:12] + Working ...
[21:03:12] - Calling 'mpiexec -channel auto -np 4 FahCore_a1.exe -dir work/ -suffix 05 -checkpoint 15 -forceasm -verbose -lifeline 3780 -version 591'

[21:03:12]
[21:03:12] *
*
[21:03:12] Folding@Home Gromacs SMP Core
[21:03:12] Version 1.74 (March 10, 2007)
[21:03:12]
[21:03:12] Preparing to commence simulation
[21:03:12] - Assembly optimizations manually forced on.
[21:03:12] - Not checking prior termination.
[21:03:12]
[21:03:12] Folding@home Core Shutdown: MISSING_WORK_FILES
[21:03:12] Finalizing output

So, am I doing something wrong in here? Dang, I already lost 4 SMP units because of this... :confused:
«1

Comments

  • DanGDanG I AM CANADIAN Icrontian
    edited February 2008
    You do have both installs of the SMP client in different directories, right?
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited February 2008
    Yes sir, and each with its own Machine ID.

    I just did a new test and reinstalled it again, disabling the affinity changer and I am also getting this "could not get process id information". I believe this has something to do with this problem.
  • QeldromaQeldroma Arid ZoneAh Member
    edited February 2008
    Try adding the -local flag?

    (err- sorry yes you did). Is your client up to date?
  • edcentricedcentric near Milwaukee, Wisconsin Icrontian
    edited February 2008
    So, can you run without the affinity changer?
    Can you run each SMP by it self?
  • kryystkryyst Ontario, Canada
    edited February 2008
    Someone correct me if I'm wrong but I thought the point of the SMP client was that it ran 1 client that used all the cores, so running multiple SMP clients isn't beneficial.
  • SPIKE09SPIKE09 Scatland
    edited February 2008
    kryyst wrote:
    Someone correct me if I'm wrong but I thought the point of the SMP client was that it ran 1 client that used all the cores, so running multiple SMP clients isn't beneficial.
    that is the exact stanford and FCO line kryyst, some folks run two on a q6600, as one SMP on a quad is not that much faster than one on an e6600. they treat the q6600 as 2 e6600's.
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited February 2008
    Yes, I am already using the -local flag, and no, even one client is doing the same thing... not shuting down correctly. Perhaps its not related to the affinity chequer.
    --- Opening Log file [February 19 18:03:30]


    # SMP Client ##################################################################
    ###############################################################################

    Folding@Home Client Version 5.91beta6

    http://folding.stanford.edu

    ###############################################################################
    ###############################################################################

    Launch directory: U:\SMP1
    Executable: U:\SMP1\fah.exe
    Arguments: -local -forceasm -verbosity 9

    Warning:
    By using the -forceasm flag, you are overriding
    safeguards in the program. If you did not intend to
    do this, please restart the program without -forceasm.
    If work units are not completing fully (and particularly
    if your machine is overclocked), then please discontinue
    use of the flag.

    [18:03:30] - Ask before connecting: No
    [18:03:30] - User name: _-_ThaNexus_-_ (Team 93)
    [18:03:30] - User ID: 598F5D623175336
    [18:03:30] - Machine ID: 1
    [18:03:30]
    [18:03:30] Loaded queue successfully.
    [18:03:30] - Preparing to get new work unit...
    [18:03:30] - Autosending finished units...
    [18:03:30] + Attempting to get work packet
    [18:03:30] Trying to send all finished work units
    [18:03:30] - Will indicate memory of 2046 MB
    [18:03:30] + No unsent completed units remaining.
    [18:03:30] - Connecting to assignment server
    [18:03:30] - Autosend completed
    [18:03:30] Connecting to http://assign.stanford.edu:8080/
    [18:03:32] Posted data.
    [18:03:32] Initial: 40AB; - Successful: assigned to (171.64.65.64).
    [18:03:32] + News From Folding@Home: Welcome to Folding@Home
    [18:03:32] Loaded queue successfully.
    [18:03:32] Connecting to http://171.64.65.64:8080/
    [18:03:36] Posted data.
    [18:03:36] Initial: 0000; - Receiving payload (expected size: 2949384)
    [18:06:30] - Downloaded at ~16 kB/s
    [18:06:30] - Averaged speed for that direction ~16 kB/s
    [18:06:30] + Received work.
    [18:06:30] + Closed connections
    [18:06:30]
    [18:06:30] + Processing work unit
    [18:06:30] Core required: FahCore_a1.exe
    [18:06:30] Core found.
    [18:06:30] Working on Unit 01 [February 19 18:06:30]
    [18:06:30] + Working ...
    [18:06:30] - Calling 'mpiexec -channel auto -np 4 FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 15 -forceasm -verbose -lifeline 3776 -version 591'

    [18:06:30]
    [18:06:30] *
    *
    [18:06:30] Folding@Home Gromacs SMP Core
    [18:06:30] Version 1.74 (March 10, 2007)
    [18:06:30]
    [18:06:30] Preparing to commence simulation
    [18:06:30] - Ensuring status. Please wait.
    [18:06:32] - Starting from initial work packet
    [18:06:32]
    [18:06:32] Project: 2653 (Run 24, Clone 132, Gen 61)
    [18:06:32]
    [18:06:32] Assembly optimizations on if available.
    [18:06:32] Entering M.D.
    [18:06:52] on if available.
    [18:06:52] Entering M.D.
    [18:06:58] Rejecting checkpoint
    [18:06:59] E boost OK.
    [18:06:59] tein in POPCExtra SSE boost OK.
    [18:06:59]
    [18:07:00] Extra SSE boost OK.
    [18:07:00] Writing local files
    [18:07:00] Completed 0 out of 500000 steps (0 percent)
    [18:08:01] Killing all core threads
    [18:08:01] Killing SMP core threads
    [18:08:01] Could not get process id information. Please kill core process manually

    Folding@Home Client Shutdown at user request.
    [18:08:01] ***** Got a SIGTERM signal (2)
    [18:08:01] Killing all core threads
    [18:08:01] Killing SMP core threads
    [18:08:01] Could not get process id information. Please kill core process manually

    Folding@Home Client Shutdown.

    This "could not get process id information" thing is my first. Never saw that before.
  • edcentricedcentric near Milwaukee, Wisconsin Icrontian
    edited February 2008
    UN, So you are running quad core?
    If one won't run right it sounds like remove and reinstall time to me.
  • QeldromaQeldroma Arid ZoneAh Member
    edited February 2008
    edcentric wrote:
    UN, So you are running quad core?
    If one won't run right it sounds like remove and reinstall time to me.

    Yeah -> correct me if I'm wrong UN <- he's using a quad-core with two installations of the SMP client using two cores each. He's using a core affinity assignment tool to do so.

    I also agree that he should do a reinstall. I think that each install should have a separate download so that the UIDs are unique.
  • mmonninmmonnin Centreville, VA
    edited February 2008
    There is too much cross talk between each dual core chip in the quad core package for the SMP client to be twice the speed as a dual core. So we run 2 clients, each client on 2 cores and the Afinity changer sets 4 exes to each half of the quad.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited February 2008
    Agreed with above recommendations, with additional advice:

    -- Completely uninstall all F@H clients, folders, everything
    -- ensure computer (Windows) is set to login with password (can set to login automatically at boot up)
    -- download fresh Microsoft .net Framework and install
    -- download fresh Affinity Changer
    -- download fresh, latest Win SMP client
    -- reinstall both clients

    Also, would you please tell us your config file settings. I'm wondering if you've got a bad setting in there.

    BTW, in my experience running several computers with Win SMP, I've found the current Win SMP F@H client to be more stable than the previous client. Still though, I always manually backup the entire contents of the client folders before I shut down the clients, always. That has saved countless work units.
  • DanGDanG I AM CANADIAN Icrontian
    edited February 2008
    kryyst wrote:
    Someone correct me if I'm wrong but I thought the point of the SMP client was that it ran 1 client that used all the cores, so running multiple SMP clients isn't beneficial.



    I'm getting ~1000ppd MORE on each of my Q6600's by using the affinity changer and running 2 clients. My 2 home boxes are putting out a combined 7500PPD.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited February 2008
    Yeah, I'm getting up to 1500PPD more per day dual SMP versus single. Typically the work units finish with about 75% of time remaining before deadline.

    I understand that Stanford wants work units back as quickly as possible, but there's room for moderation.
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited February 2008
    I only installed it once, and copied the contents to a second folder, then configured each manually. The MCH service runs over the first folder. Should I have two services?

    EDIT: here is my config:

    [settings]
    username=_-_ThaNexus_-_
    team=93
    asknet=no
    bigpackets=yes
    machineid=1
    local=3

    [http]
    active=no
    host=localhost
    port=8080
    usereg=no

    [clienttype]
    type=3
  • mertesnmertesn I am Bobby Miller Yukon, OK Icrontian
    edited February 2008
    What user are you running the clients under? Maybe try running them as Administrator.

    Also, when was the last time you downloaded a new client version? While this doesn't look like a two-month expiration I've had some weird results when it's time to update versions.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited February 2008
    The MCH service
    OK, I'm dense, what is "MCH service?"
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited February 2008
    Leonardo wrote:
    OK, I'm dense, what is "MCH service?"

    lol, sorry, mispelled. Meant "MPICH2 Process Manager, Argonne National Lab" service. :)

    Should I have two?

    So, I followed your advice to uninstall everything and install again, reset my user account password (I was using another administrator user before) and it seems to be running along this time :D

    --- Opening Log file [February 20 23:00:51]


    # SMP Client ##################################################################
    ###############################################################################

    Folding@Home Client Version 5.91beta6

    http://folding.stanford.edu

    ###############################################################################
    ###############################################################################

    Launch directory: U:\SMP1
    Executable: U:\SMP1\fah.exe
    Arguments: -local -forceasm -verbosity 9

    Warning:
    By using the -forceasm flag, you are overriding
    safeguards in the program. If you did not intend to
    do this, please restart the program without -forceasm.
    If work units are not completing fully (and particularly
    if your machine is overclocked), then please discontinue
    use of the flag.

    [23:00:51] - Ask before connecting: No
    [23:00:51] - User name: _-_ThaNexus_-_ (Team 93)
    [23:00:51] - User ID: 598F5D623175336
    [23:00:51] - Machine ID: 1
    [23:00:51]
    [23:00:51] Loaded queue successfully.
    [23:00:51] - Autosending finished units...
    [23:00:51] - Preparing to get new work unit...
    [23:00:51] Trying to send all finished work units
    [23:00:51] + Attempting to get work packet
    [23:00:51] + No unsent completed units remaining.
    [23:00:51] - Autosend completed
    [23:00:51] - Will indicate memory of 1024 MB
    [23:00:51] - Connecting to assignment server
    [23:00:51] Connecting to http://assign.stanford.edu:8080/
    [23:00:52] Posted data.
    [23:00:52] Initial: 40AB; - Successful: assigned to (171.64.65.64).
    [23:00:52] + News From Folding@Home: Welcome to Folding@Home
    [23:00:53] Loaded queue successfully.
    [23:00:53] Connecting to http://171.64.65.64:8080/
    [23:00:57] Posted data.
    [23:00:57] Initial: 0000; - Receiving payload (expected size: 2958830)
    [23:02:19] - Downloaded at ~35 kB/s
    [23:02:19] - Averaged speed for that direction ~35 kB/s
    [23:02:19] + Received work.
    [23:02:19] + Closed connections
    [23:02:19]
    [23:02:19] + Processing work unit
    [23:02:19] Core required: FahCore_a1.exe
    [23:02:19] Core found.
    [23:02:19] Working on Unit 01 [February 20 23:02:19]
    [23:02:19] + Working ...
    [23:02:19] - Calling 'mpiexec -channel auto -np 4 FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 10 -forceasm -verbose -lifeline 4064 -version 591'

    [23:02:20]
    [23:02:20] *
    *
    [23:02:20] Folding@Home Gromacs SMP Core
    [23:02:20] Version 1.74 (March 10, 2007)
    [23:02:20]
    [23:02:20] Preparing to commence simulation
    [23:02:20] - Assembly optimizations manually forced on.
    [23:02:20] - Not checking prior termination.
    [23:02:24] - Expanded 2958318 -> 15212615 (decompressed 514.2 percent)
    [23:02:24] - Starting from initial work packet
    [23:02:24]
    [23:02:24] Project: 2653 (Run 7, Clone 71, Gen 62)
    [23:02:24]
    [23:02:25] Assembly optimizations on if available.
    [23:02:25] Entering M.D.
    [23:02:32] Rejecting checkpoint
    [23:03:31] Protein: Protein in POPC
    [23:03:31] Writing local files
    [23:03:33] Extra SSE boost OK.
    [23:03:35] Writing local files
    [23:03:35] Completed 0 out of 500000 steps (0 percent)
    [23:09:56] Killing all core threads
    [23:09:56] Killing SMP core threads
    [23:09:56] Killing 3 cores
    [23:09:56] Killing core 0
    [23:09:56] Killing core 1
    [23:09:56] Killing core 2

    Folding@Home Client Shutdown at user request.
    [23:09:56] ***** Got a SIGTERM signal (2)
    [23:09:56] Killing all core threads
    [23:09:56] Killing SMP core threads
    [23:09:56] Killing 3 cores
    [23:09:56] Killing core 0
    [23:09:56] Killing core 1
    [23:09:56] Killing core 2

    Folding@Home Client Shutdown.

    I have downloaded the client yesterday. I´ll try copying to the 2nd folder and see if both work together fine!

    Thanks to all!
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited February 2008
    No, in Task Manager there should just be one showing - MPIEXE (sp?)

    Do you have your clients installed under a 'user' with administrative rights? If not, it won't work. Also, the clients have to be installed under a user that logs into Windows with a password. I've set my machines automatically login on Windows boot.
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited February 2008
    I see 2 mpiexec.exe processes in the task manager... and yes, its now set with the same administrative user and it automatically logins in. I'm running the 2 clients with the affinity chequer again... I´ll let them fold for a while and try closing them with control-c and see what happens.
  • mmonninmmonnin Centreville, VA
    edited February 2008
    Hmm I have 2x mpiexec.exe processes on my Quad with 2 clients and 1 process on my C2D with 1 client. The quad does nothing else but fold...<shrug>
  • DanGDanG I AM CANADIAN Icrontian
    edited February 2008
    I also have 2 mpiexec.exe's showing. One has a PID of 368 and the other one is 2900.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited February 2008
    OK, I just checked and yes, there should be an mpiexec.exe for each client. Sorry about the bogus information. I was on a work computer when I posted that.
    I only installed it once, and copied the contents to a second folder, then configured each manually.
    Ahh, that may be your problem. I don't know if that screws anything up or not - but copying an installation might leave out registry entries (or something) that might be needed. If you still have problems, install each client, no copies.

    Has anyone else here just copied the contents of one client folder to another?
  • QeldromaQeldroma Arid ZoneAh Member
    edited February 2008
    Leonardo wrote:
    I don't know if that screws anything up or not - but copying an installation might leave out registry entries (or something) that might be needed.

    If I remember right, the installations should be done with separate downloads so that each instance of the client has a unique UID for the project to use.
  • mertesnmertesn I am Bobby Miller Yukon, OK Icrontian
    edited February 2008
    While I can't think of a good reason not to do so, it doesn't sound like a good idea. I usually just install the whole thing twice - first to a "Core 1" directory and the second to a "Core 2" directory.
  • mertesnmertesn I am Bobby Miller Yukon, OK Icrontian
    edited February 2008
    Qeldroma wrote:
    If I remember right, the installations should be done with separate downloads so that each instance of the client has a unique UID for the project to use.

    Using the same download shouldn't cause a problem. It's the same binary that's being executed whether you download it once or twice. I'd just install and configure one at a time.
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited February 2008
    I realized that when I do a Control-C it only kills 3 threads
    [23:03:35] Completed 0 out of 500000 steps (0 percent)
    [23:09:56] Killing all core threads
    [23:09:56] Killing SMP core threads
    [23:09:56] Killing 3 cores
    [23:09:56] Killing core 0
    [23:09:56] Killing core 1
    [23:09:56] Killing core 2

    Folding@Home Client Shutdown at user request.
    [23:09:56] ***** Got a SIGTERM signal (2)
    [23:09:56] Killing all core threads
    [23:09:56] Killing SMP core threads
    [23:09:56] Killing 3 cores
    [23:09:56] Killing core 0
    [23:09:56] Killing core 1
    [23:09:56] Killing core 2

    Folding@Home Client Shutdown.

    This leaves 1 thread opened for each time I close any of the clients... dang!
  • mmonninmmonnin Centreville, VA
    edited February 2008
    Yep 2 installations, 1 download is fine. The IDs don't come until the client contacts stanford after installation.
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited February 2008
    OK, I have installed it twice... I´ll let you know how it went. :)

    EDIT: still no luck. Yes, both processes start fine and all, but when I Control-C one, it not only shuts down only 3 processes (instead of 4) but it also errors out the other running client with a "Client-core communications error: ERROR 0x7b".

    Dont know why this is happening... does anyone running 2 Win SMP clients have this same problem or the clients shut down the 4 processes correctly on each client?

    Thanks!
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited February 2008
    UN, I've often had problems shutting done just one client when two clients are running on the same computer. Perhaps there is an easier way, but here's what I do:

    1) Before shutting down clients I copy the entire contents of each client folder to a backup folder (one backup folder per each operational folder)
    2) Open both operating clients to the desktop
    3) In rapid succession shutdown each client via CTR+C. Don't just stop one client - stop both of them. It's none or all, in my experience.
    4) When restarting the clients later, sometimes it is necessary to delete the contents of the operational client folders and copy over the 'clean' files from the respective backup folders.

    Usually I can restart the clients without copying over backed up files, but I had so many problems before with corrupted units at manual shutdowns I've just made it a habit to always backup the folders' contents.

    Yes, sometimes it takes a long, long time for all the Folding processes to stop after a manual client shutdown. It's ridiculous.
  • mmonninmmonnin Centreville, VA
    edited February 2008
    I've never shut down 1 client but word is it has always been a problem.
Sign In or Register to comment.