P935_LG2's in Water not sending

Leonardo · April 2004

As I said in the thread starter - 935 LG2s 'in water' are not sending. They will complete in good order, core will 'shut down', and then nothing will happen for hours. To get things moving again, I must close the Folding client and restart for another work unit to download and execute. AND, I'm not even sure if the completed 935s are being sent to Stanford servers. All the 935s my systems have processed have been on my System 1; none of my other systems have been assigned them.

What gives? Here's an example of a 935 that completed and just sat in the client folder for several hours:

[00:23:44] Project: 935 (Run 40, Clone 2, Gen 12)
[00:23:44]
[00:23:44] Assembly optimizations on if available.
[00:23:44] Entering M.D.
[00:24:05] (Starting from checkpoint)
[00:24:05] Protein: p935_LG2 in water
[00:24:05]
[00:24:05] Writing local files
[00:24:11] Completed 55000 out of 500000 steps (11)
[00:24:11] Extra SSE2 boost OK.
............
[00:56:51] Writing local files
[00:56:53] Completed 80000 out of 500000 steps (16)
[01:03:08] Writing local files
[01:03:10] Completed 85000 out of 500000 steps (17)
.......
[01:09:22] Writing local files
[[09:14:58] Completed 495000 out of 500000 steps (99)
[09:20:55] Writing local files
[09:20:57] Completed 500000 out of 500000 steps (100)
[09:20:59] Writing final coordinates.
[09:21:03] Past main M.D. loop
[09:22:03]
[09:22:03] Finished Work Unit:
[09:22:03] - Reading up to 74612 from "work/wudata_03.arc": Read 74612
[09:22:03] - Reading up to 51552 from "work/wudata_03.xtc": Read 51552
[09:22:03] goefile size: 0
[09:22:03] logfile size: 238388
[09:22:03] Leaving Run
[09:22:04] - Writing 1538426 bytes of core data to disk...
[09:22:04] ... Done.
[09:22:04] - Shutting down core

(My italics)

Leonardo · April 2004

After closing the client, which is shown above, I waited about 10 seconds and restarted the client. No evidence that the completed 935 work unit was sent to Stanford. Am I missing something here?

Arguments: -advmethods

[19:37:56] - Ask before connecting: No
[19:37:56] - Use IE connection settings: Yes
[19:37:56] - User name: Leonardo (Team 93)
[19:37:56] - User ID = 7FCF37645D32A9EF
[19:37:56] - Machine ID: 1
[19:37:56]
[19:37:56] Loaded queue successfully.
[19:37:56] + Benchmarking ...
[19:38:01]
[19:38:01] + Processing work unit
[19:38:01] Core required: FahCore_79.exe
[19:38:01] Core found.
[19:38:01] Working on Unit 03 [April 6 19:38:01]
[19:38:01] + Working ...
[19:38:01]
[19:38:01] *

*
[19:38:01] Folding@home Double Gromacs Core
[19:38:01] Version 1.61 (March 22, 2004)
[19:38:01]
[19:38:01] Preparing to commence simulation
[19:38:01] - Ensuring status. Please wait.
[19:38:18] - Looking at optimizations...
[19:38:18] - Working with standard loops on this execution.
[19:38:18] - Created dyn
[19:38:18] - Files status OK
[19:38:18] - Expanded 91662 -> 501429 (decompressed 547.0 percent)
[19:38:18] - Starting from initial work packet
[19:38:18]
[19:38:18] Project: 935 (Run 40, Clone 2, Gen 12)
[19:38:18]
[19:38:19] Entering M.D.
[19:38:25] Protein: p935_LG2 in water
[19:38:25]
[19:38:25] Writing local files
[19:38:31] Writing local files
[19:38:33] Completed 0 out of 500000 steps (0)

muddocktor · April 2004

That's strange, Leo. What I find interesting is when you started back up that there is no mention of sending the WU, plus the fact that the client restarted using standard optimizations with assembly loops turned off. It sounds like the client got confused when shutting down the core and writing the data to disk and hung. When you shut it down, the client evidently thought it was shut down improperly and when you restarted the clinet turned off SSE/SSE2 optimizations.

Also Leo, add the -verbosity 9 flag to your startup so you get a more detailed fahlog, which helps in cases like this. Plus, you can also add the -forceasm flag too which will make the client start with SSE/SSE2 enabled, no matter how the client was shut down.

You really ought to restart the client so the client will start using SSE2 because if you don't it will take 10 forevers to finish that WU.

Leonardo · April 2004

You are making my point - why are these units not sending after completion?

SSE2 implimentation. I never had a problem with proper startup of SSE/2 until these underwater jobbers. Yes, it is time to add -forceasm and -verbosity

Leonardo · April 2004

OK, I'm sold now on the "-verbosity 9" flag. I've also added "-forceasm". I used to run the asm flag on previous versions of Folding, but I thought v4 had rendered the flag unnecessary. Guess I was wrong. Upon reviewing the log after restarting with the added flag, SSE2 did reengage for both work units. Also, the log reports that unsent work units have been uploaded to Stanford.

...............
[23:04:36] - Ask before connecting: No
[23:04:36] - Use IE connection settings: Yes
[23:04:36] - User name: Leonardo (Team 93)
[23:04:36] - User ID = 7FCF37645D32A9EF
[23:04:36] - Machine ID: 1
[23:04:36]
[23:04:36] Loaded queue successfully.
[23:04:36] + Benchmarking ...
[23:04:40] The benchmark result is 8956
[23:04:40]
[23:04:40] - Autosending finished units...
[23:04:40] + Processing work unit
[23:04:40] Trying to send all finished work units
[23:04:40] Core required: FahCore_79.exe
[23:04:40] + No unsent completed units remaining.
[23:04:40] - Autosend completed
[23:04:40] Core found.
[23:04:40] Working on Unit 03 [April 6 23:04:40]
[23:04:40] + Working ...
[23:04:40] - Calling 'FahCore_79.exe -dir work/ -suffix 03 -priority 96 -checkpoint 15 -forceasm -verbose -lifeline 1408 -version 400'
.............

muddocktor · April 2004

Yeah, I think everyone should run the -verbosity 9 flag, it makes troubleshooting so much easier. I haven't had the problems you've had on them sending in after completion on my P4 machines, but I still don't care for them (the double Gro WU's) very much. The core doesn't keep the processor 100% loaded like a regular Gro and the points/hr return on them is generally lower than a regular Gro too. I think Stanford needs to work on the process some more before they send them out to -advmethods personally.

mmonnin · April 2004

G from Stanford is working on the CPU utulization. mudd, run more than 2 clients when you get some of these.

Leonardo · April 2004

...I still don't care for them (the double Gro WU's) very much. The core doesn't keep the processor 100% loaded like a regular Gro

Thinking about setting up a third client. Can I add another client as a service in FireDaeman? There is already one client in there. I don't want the machine ID's to be the same.

mmonnin · April 2004

Nope the free version can only do one service.

conslole and trayit if you want it out of the way. I see some people use the -onunit flag when they get all these gromacs on the third client.

Leonardo · April 2004

Hey hey hey - three clients up and running. :bigggrin:

This deserves a thread of it's own. I'll start one as soon as I get things sorted out.

Leonardo · April 2004

New three client thread here .

P935_LG2's in Water not sending

Comments