FAH-500/502 not so efficient?
Spinner
Birmingham, UK
Ever since freshly installing v5.00 and now v5.02, all my folding rigs have been extremely high maintainence with regard to downloading and initializing new work units etc.
It's driving me crazy, at least once a day one rig will just stop Folding after it's finished a work unit, it seems incapable of retrieving a new WU with out user intervention, e.g. a restart.
I'm running console on all of my rigs now, all as a service and all at default settings e.g no switches.
All Athlon XP's, 2 Palomino's, 4 Bartons.
What's the deal, is it WU server based or a problem with the client?
It's driving me crazy, at least once a day one rig will just stop Folding after it's finished a work unit, it seems incapable of retrieving a new WU with out user intervention, e.g. a restart.
I'm running console on all of my rigs now, all as a service and all at default settings e.g no switches.
All Athlon XP's, 2 Palomino's, 4 Bartons.
What's the deal, is it WU server based or a problem with the client?
0
Comments
Yet!.............
My boxes are on dial up and they are supposed to ask for connection. A couple of them often tried to send auto. That has stoped.
Sounds like a personal problem to me.
One thing that comes to mind though is when you ran the config on the client, did you answer "yes" to the "use Internet Explorer settings" question? If you did, then change that to a "no" in your config. I've had problems in the past with using IE settings with both version 3.xx and version 4 of the console and it might still be giving problems. I think that there is still some kind of bug in the client code throughout the various versions that makes it occasionally give problems receiving work when having that set to "yes".
Hope this helps you out, Spinner.
Thanks.
I'm starting to think it's a firewall problem that is causing the massive delays in obtaining new work units. I come to this conclusion because the only rig that doesn't seem to be having any trouble isn't using NIS 2003 or 2004, it's just using the Windows XP one, and that particular rig never suffers any problems like the others do. It folds, sends the results, get's some more work, then folds some more.
On the other rigs, they will all just stall and keep failing to get work units after they've completed one. I have no doubt that eventually they'll get one, but I shouldn't have to wait 6 hours or more till they can carry on folding.
If I reboot the machines when they get into this difficulty, they almost immediately manage to get work once the computer has reloaded. But I shouldn't have to do that. What's the deal?
Looking at your log the first thing that jumped out at me was the -service flag.
You don't need it if you're using v5+.
I recommend uninstalling the service and removing everything except for the client v5+ and the work folder and perhaps the queue.dat file.
From the command prompt run fah5.02_console -configonly and configure it to run as a service and set everything else to your preference and you should'nt have any problems.
Also ...cutnpaste your client.cfg please sir if you don't mind!
Here's mine:
[settings]
username=csimon
team=93
asknet=no
bigpackets=yes
machineid=1
local=29
[http]
active=no
host=localhost
port=8080
usereg=no
[core]
checkpoint=30
ignoredeadlines=yes
[clienttype]
type=2
nonet=yes
I've been using http://www.bluetentacle.co.uk/dc/fahservice.htm service installer for ages... and it's always been working great, but I shall try what you suggested. I was aware v5+ had the ability to work as a service without third party assistance but I didn't see any reason to change my current method of installing it as a service. But I'll do exactly as you suggested thanks mate. I'll post back when the machines have had chance to do a work unit or two.
(Is their any particular reason do you think that the 'host' and 'port' fields below from my client.cfg file are blank? Could this be the problem?)
[settings]
username=Spinner
team=93
asknet=no
machineid=1
bigpackets=no
local=5
[http]
active=no
host=
port=
usereg=no
[clienttype]
type=1
[core]
priority=0
cpuusage=100
disableassembly=no
checkpoint=15
ignoredeadlines=no
[power]
battery=no
I'm sure we didn't file the same configuration so that may be why ...I doubt that is your problem though.
If you think it would help we can go thru the config setup step by step ...I'm game.
Cheers bud.
A P265 taking 104:55:00hrs (01:22:57hrs@frame) and a P263 taking 118:11:40hrs (01:10:55hrs@frame). This is a P4 3.2c with 2x512mg pc3200 ram in slot 1&3 HT mode. Mobo is an Abit AI7. This is slower than watching grass grow in winter.
Today on another puter a P1301 decided to terminate early.....aarrggghh. Not o'clocked and running cool.
Guess in a way i understand your frustration with the problem.
Good luck
Jon
I'm still following up the firewall issue, but it just ain't that complicated for me to have missed something. Still... only time will tell.
EDIT: As far as a 1301 giving an early_unit_end, don't worry about it. There is a much higher failure rate on the p130x series than normal. Stanford is aware of it but says even the early terminations are giving them valuable data and you will get partial credit for the work done anyways. BTW, they are actually work 182 points, not 139 like EMIII will show. That is because they've bounced the points values around for them a bit. When they were still in early beta, before release to -advmethods, they were worth 242 points, then some of the whiny babies at the community started saying that they were worth too much and people would dump wu's to get them. Then they knocked the points back to 139 and the whiny bitches started crying that they were worth too little and they weren't going to do them any more. So now they are worth 182 points.
Spinner, if v5 keeps giving trouble then I would suggest that you go back to the v4 client. Since you aren't running the bigwu option and you already have it installed as a service using a third party app, there really isn't any advantage to you running the v5 client at present. After all, you know v4 works for you. I really haven't noticed any speed difference in v5, either faster or slower so far.
Bit of the log for ya Jim.
If it wasn't for the fact we take them as they come i would have deleted them and started afresh, lol.
Once this is posted i'm going to bed. Too tired again. It is 03:00 am in Perth at the moment. Need sleep, hehe.
--- Opening Log file [August 25 00:36:46]
# Windows Graphical Edition ###################################################
###############################################################################
Folding@home Client Version 4.00
http://folding.stanford.edu
###############################################################################
###############################################################################
[00:36:46] - Ask before connecting: No
[00:36:46] - Use IE connection settings: Yes
[00:36:46] - User name: Jonshandbrake (Team 93)
[00:36:46] - User ID = 2303B12F6CBCDD32
[00:36:46] - Machine ID: 1
[00:36:46]
[00:36:46] Loaded queue successfully.
[00:36:46] Initialization complete
[00:36:46] + Benchmarking ...
[00:36:50]
[00:36:50] + Processing work unit
[00:36:50] Core required: FahCore_78.exe
[00:36:50] Core found.
[00:36:50] Working on Unit 02 [August 25 00:36:50]
[00:36:50] + Working ...
[00:36:50] + Working...
[00:36:50]
[00:36:50] *
*
[00:36:50] Folding@home Gromacs Core
[00:36:50] Version 1.65 (May 6, 2004)
[00:36:50]
[00:36:50] Preparing to commence simulation
[00:36:50] - Ensuring status. Please wait.
[00:37:08] - Looking at optimizations...
[00:37:08] - Working with standard loops on this execution.
[00:37:08] - Previous termination of core was improper.
[00:37:08] - Files status OK
[00:36:56] - Expanded 387235 -> 2652745 (decompressed 685.0 percent)
[00:36:57]
[00:36:57] Project: 263 (Run 3, Clone 108, Gen 8)
[00:36:57]
[00:36:57] Entering M.D.
[00:37:18] (Starting from checkpoint)
[00:37:18] Protein: p263_chcl3
[00:37:18]
[00:37:18] Writing local files
[00:37:20] Completed 210000 out of 1000000 steps (21)
--- Opening Log file [August 25 01:50:27]
# Windows Graphical Edition ###################################################
###############################################################################
Folding@home Client Version 4.00
http://folding.stanford.edu
###############################################################################
###############################################################################
[01:50:27] - Ask before connecting: No
[01:50:27] - Use IE connection settings: Yes
[01:50:27] - User name: Jonshandbrake (Team 93)
[01:50:27] - User ID = 2303B12F6CBCDD32
[01:50:27] - Machine ID: 1
[01:50:27]
[01:50:28] Loaded queue successfully.
[01:50:28] Initialization complete
[01:50:28] + Benchmarking ...
[01:50:30]
[01:50:30] + Processing work unit
[01:50:30] Core required: FahCore_78.exe
[01:50:30] Core found.
[01:50:30] Working on Unit 02 [August 25 01:50:30]
[01:50:30] + Working ...
[01:50:34]
[01:50:34] *
*
[01:50:34] Folding@home Gromacs Core
[01:50:34] Version 1.65 (May 6, 2004)
[01:50:34]
[01:50:34] Preparing to commence simulation
[01:50:34] - Ensuring status. Please wait.
[01:50:51] - Looking at optimizations...
[01:50:51] - Working with standard loops on this execution.
[01:50:51] - Previous termination of core was improper.
[01:50:51] - Going to use standard loops.
[01:50:51] - Files status OK
[01:50:52] - Expanded 387235 -> 2652745 (decompressed 685.0 percent)
[01:50:52]
[01:50:52] Project: 263 (Run 3, Clone 108, Gen 8)
[01:50:52]
[01:50:52] Entering M.D.
[01:51:13] (Starting from checkpoint)
[01:51:13] Protein: p263_chcl3
[01:51:13]
[01:51:13] Writing local files
[01:51:15] Completed 214467 out of 1000000 steps (21)
[02:30:36] Writing local files
[02:30:36] Completed 220000 out of 1000000 steps (22)
[03:41:25] Writing local files
[03:41:25] Completed 230000 out of 1000000 steps (23)
[04:52:17] Writing local files
[04:52:17] Completed 240000 out of 1000000 steps (24)
[06:03:12] Writing local files
[06:03:12] Completed 250000 out of 1000000 steps (25)
[07:14:04] Writing local files
[07:14:04] Completed 260000 out of 1000000 steps (26)
[07:50:30] + Working...
[08:25:28] Writing local files
[08:25:28] Completed 270000 out of 1000000 steps (27)
[09:36:31] Writing local files
[09:36:31] Completed 280000 out of 1000000 steps (28)
[10:47:32] Writing local files
[10:47:32] Completed 290000 out of 1000000 steps (29)
[11:58:27] Writing local files
[11:58:27] Completed 300000 out of 1000000 steps (30)
[13:09:25] Writing local files
[13:09:26] Completed 310000 out of 1000000 steps (31)
[13:50:30] + Working...
[14:20:27] Writing local files
[14:20:27] Completed 320000 out of 1000000 steps (32)
[15:31:26] Writing local files
[15:31:26] Completed 330000 out of 1000000 steps (33)
[16:42:21] Writing local files
[16:42:21] Completed 340000 out of 1000000 steps (34)
[17:53:49] Writing local files
[17:53:49] Completed 350000 out of 1000000 steps (35)
[19:05:21] Writing local files
[19:05:21] Completed 360000 out of 1000000 steps (36)
[19:50:30] + Working...
[20:16:56] Writing local files
[20:16:56] Completed 370000 out of 1000000 steps (37)
[21:28:30] Writing local files
[21:28:30] Completed 380000 out of 1000000 steps (38)
[22:40:11] Writing local files
[22:40:11] Completed 390000 out of 1000000 steps (39)
[23:51:41] Writing local files
[23:51:42] Completed 400000 out of 1000000 steps (40)
*UPDATE
Quick one for you Mudd. Sally has not been well and i just mentioned in passing what has been happening. She dragged herself to the puters and found NO flags set up for the 2 slow WU's. They are still using V4.
Now, the one that was doing a step in 1hr 18min is now doing a step in a lot less time. Still trying to adjust it self. The other one is still a bit slow, but that may change further once it has done a step. This was after the flags were added and the puter restarted.
Jon
Neither of us can work out how they got missed or what happened for them to disappear. Sally spent a lot of hrs on the puter which had a new mobo fitted to suit the RedHott. This one caused us severe headaches setting up folding. (as per other thread).
At least for now, all appears well, fingers crossed, lol.
Jon
*Edit
Just noticed your post Jim. Sally set up -verbosity 9 , -advmethods , -forceSSE , -forceASM , -local .......on this paticular computer, which is still running FAHV4.0.
As i type Sally is going through them all checking flags.
Jon
I'll do the same change on the remaining two problem ridden rigs and I'll post back.
Thanks for everyones help.
Spinner
It's just a case of change the settings, watch it fold for a day or so, then try something else.
Edit
Nevermind. You've got a fah sig. I just didn't see it. :banghead: