Performance hit every 25 seconds.

csimon · January 2004

Ok ...I'm just noticing this but perhaps it's been happening all along.
I notice that while folding my performace meter drops from 100% to 0% about every 25 seconds or so. Can anyone else verify this is or isn't happening to them?
If it isn't does anyone have any suggestions as to what may be causing this?

using FAH4Console.exe -forcesse and core_78 v1.55
wtf?

a2jfreak · January 2004

what does the actual F@H executable do? (in the processes tab)

Slick · January 2004

I believe the program is saving the progress to disk everytime the usage drops down. I am not sure though.

EOC_Jason · January 2004

Do you have some sort of system monitoring script?

mmonnin · January 2004

I got something similar when a frame went by.

Thats also why you run a genome in the background which I realized I didnt set up after I got my NF7.

TBonZ · January 2004

Slick wrote:

I believe the program is saving the progress to disk everytime the usage drops down. I am not sure though.

That is absolutely correct, the drop intervals are due to checkpoint writes.

csimon · January 2004

what can I do to elimenate that?

Here is my client.cfg:

[settings]
username=csimon
team=93
asknet=no
machineid=1
local=22

[http]
active=no
host=localhost
port=8080
usereg=no

[clienttype]
type=1

[core]
checkpoint=15

If I set my checkpoint to checkpoint=0 will that supress the function?
nevermind ...no matter what I set it to it still does it.

TheBaron · January 2004

mmonnin wrote:

Thats also why you run a genome in the background which I realized I didnt set up after I got my NF7.

wait ... huh? care to elaborate on this?

csimon · January 2004

TBonZ wrote:

That is absolutely correct, the drop intervals are due to checkpoint writes.

Can I prolong or eliminate the intervals?

csimon · January 2004

EOC_Jason wrote:

Do you have some sort of system monitoring script?

just coolmon but it does it with coolmon off and I have no scheduled tasks.

csimon · January 2004

a2jfreak wrote:

what does the actual F@H executable do? (in the processes tab)

drops to 0% then goes right back up to 100% again ...only for an instant every 25 secs approx.

t1rhino · January 2004

Install another instance.

TBonZ · January 2004

Sorry Chris, it was late and I completely missed the 25 sec thing, brain fart I guess.

I just looked at my client and it's behaving the exact same way. lsevald brought this up along time ago when Gro's were fairly new but I cannot remember if the intervals were this short or if the intervals were strictly between checkpoints which would be a matter of minutes not seconds.

This would be a good topic to post at the community forum as I am now very interested in why this is happening.

mmonnin · January 2004

I stil have V3.25 and it does it only when finishing a frame. The checkpoints are not on V3 but the checkpoints are at a min of every 3 minutes so this 25 sec thing cant be that.

You are finishing a frame every 25 seconds are you?:)

TBonZ · January 2004

I'm also using V3.25 and my checkpoints with this protein are 13-14 minutes.

a2jfreak · January 2004

I just checked and it's doing it on my system too.
I'm fairly certain that my system did not do this before I switched from core v.1.5.4 to 1.5.5. Anyone else that is experiencing this, which core version are you using? If the concensus seems to be v1.5.5, then I think I might go back to v.1.5.4 just to double check.

csimon · January 2004

marc or terry can you post a copy of your v3.25 client.cfg ...something may be missing from v4.0 ...I'm mainly looking for cpuusage=

mmonnin · January 2004

[settings]
username=mmonnin
team=93
asknet=no
machineid=1
local=371

[http]
active=no
host=localhost
port=8080
usereg=no
usepasswd=yes

[clienttype]
type=1

[core]
priority=96
cpuusage=100
disableassembly=no
ignoredeadlines=no

This is running at low priority since there is a genome in the background at idle.

csimon · January 2004

notice anything odd?

[settings]
username=csimon
team=93
asknet=no
machineid=1
local=28

[http]
active=no
host=localhost
port=8080
usereg=no

[clienttype]
type=1

[core]
checkpoint=30

I'm left to assume that with the new client 4.0 that if [core] settings are set to anything but default then they are set in client.cfg. I manually added cpuusage=100 and it made no difference.

maybe it is the core? or core + v4client

t1rhino · January 2004

delete your client.cfg and start the console with the -config.
re-enter team #93, userid=t1rhino,
cpuusage=100

mmonnin · January 2004

Yeah I think what t1rhino said will fix it.;)

I have v1.55 core.

t1rhino · January 2004

I have core 1.54. How do I get core 1.55?

Straight_Man · January 2004

Well, as far as genome, I am not sure. As to the Gromacs client, look in a file called client.cfg and the checkpoint entry is in minutes after checkpoint=. HOWEVER, what you see is probably not that checkpoint var. What you might be seeing,which was done in the newer core and client together, is that the workunit.cp file is revised after a frame with tinkers and after a percent or frame with Gromacs WUs. What you are seeing, imho, is a read\write cycle by frame, frame complete to the .cp file (work progress archive) and a read of next frame, as it is the core that is pulling the load and it is not drawing load as much while writing and reading to HD as while calculating.

If you play with this, you will have to disassemble the client and the core and build in a workspace in RAM to store a percentage worth of work in-- essentially you will be losing up to one percent every time the client or core hangs this way. they did this simply so the client would not have to backtrack if, say, windows crashed instead of someone doing an orderly shutdown before rebooting windows and widnows not hanging. The new core has never backtracked more than 1 percent on anything I have let it run, and same for tinkers, except there it now is never mere than one frame-- so most of this is in the client, though the core does not exit. Client would have to be rewritten, and Guha at Folding did most if not all the work on current core and client, or co-ordinated that. Talk to guha, best way to find out if you have a hyper stable box.

One reason I think this is so, is I get pattern that is part of a percent of very tiny time drop like you do out of my F@Hs here, and it does not relate to the checkpoint= numbers, as it is tiny short time, and the checkpoint in client.cfg on both clients(linux and Windows, same release versions but appropriate for the O\S as to client) is 15 MIN cycles. Most folks like less to lose work not saved than to take 4% of time to write and not have to recalc if box dies or locks and and becasue they have other than perfectly stable boxes. In theory if you wanted the client to grab much more RAM, you could put the accumulation into RAM for 1-5% of work instead of 1-500 FRAMES of work (depenmding on size of Wu, the new biggest ones run 500 steps per prercent and I do not know if those are using one FRAME per step or not), but I think you might have fun rewriting what would be needed to get your client to go always-on all the time without giving it a priority that would override normal use of computer. My computer graphical client is set to run CPU 100%, and except for write chunk to HD and then set up for next portion of work in RAM, it does. But, with new client and core, have enver lost 15 min of work.... Even by trying....

John.

t1rhino · January 2004

Just started watching task manager, and it seems that cpu usage goes down to zero at the completion of each step.

csimon · January 2004

this is what I get from a fresh config

gtghm · January 2004

Mine does this too, however I doubt that its any thing to worry about.

Straight_Man · January 2004

When data is handed off to client to be written to either HD or a client workspace in RAM for acccumulating steps equal to a "write to HD chunk", the core suspends until new step is in RAM, basicly-- the core is what pumps your CPU usage, when it is calculating results actively. Nothing to worrry about, normal for this software set.

John.

csimon · January 2004

strange but this doesn't seem to be happening on any instances running on my P4's ...only the one instance on the AMD xp3000+/400

csimon · January 2004

magnificent ...it must have been the particular protein ...and now my processor is running hotter than hell on the new wave of gromacs.

Performance hit every 25 seconds.

Comments