Problem getting new WU.

Esso · May 2006

Hi,
Yesterday my folding stopped due to problem with one of the servers.
It didn't respond, to the request for a new WU.

Initially I had problem just to report the completed WU, but after a while it went through.

Never the less I couldn't get a new WU and that halted my folding.

I solved the problem, by coping the FAH504-Console.exe to a new directory
and reconfigured the folding parameters.
After that it requested new WU from a different folding server, (171.64.122.120).

So please make sure that your current folding computers are up and running.

Here is the log from the falling server: 171.64.122.112 (just 15 minutes ago)

[15:55:23] Loaded queue successfully.
[15:55:23] + Benchmarking ...
[15:55:25] The benchmark result is 6024
[15:55:25] - Preparing to get new work unit...
[15:55:25] - Autosending finished units...
[15:55:25] + Attempting to get work packet
[15:55:25] Trying to send all finished work units
[15:55:25] - Will indicate memory of 1023 MB
[15:55:25] + No unsent completed units remaining.
[15:55:25] - Connecting to assignment server
[15:55:25] - Autosend completed
[15:55:25] Connecting to http://assign.stanford.edu:8080/
[15:55:27] Posted data.
[15:55:27] Initial: 40AB; - Successful: assigned to (171.64.122.112).
[15:55:27] + News From Folding@Home: Welcome to Folding@Home
[15:55:27] Loaded queue successfully.
[15:55:27] Connecting to http://171.64.122.112:8080/
[15:55:31] - Couldn't send HTTP request to server
[15:55:31] (Got status 503)
[15:55:31] + Could not connect to Work Server
[15:55:31] - Error: Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[15:55:39] ***** Got a SIGTERM signal (2)
[15:55:39] Killing all core threads

muddocktor · May 2006

Right now server 112 is FUBAR, so nobody is getting any timeless wu's until Stanford gets off their ass and fixes the darn server, Esso.

FoldingAddict · May 2006

That sucks! That means some of my folders are probably idling right now.

~FA

Esso · May 2006

And during this time the folding is halted.
At least mine was for a couple of hours, until I had enough and created a new folding directory.

I'm using Opty-165 (dual core processor) with two folding directory's, configured as unit 1 and 2 respective.

And when unit 2 couldn't proceed, I halted it and created a new folding directory and configured it as unit 3.

So that it will not not get the same WU that couldn't be reported.
In this way my computer will keep on folding.

Later people with problem reporting their finished WU, can do so when Stanford fixes the server.
Please add -oneunit doing this, then you will not be assigned a new WU again for this folding directory ...

It has been down now ~ 36 hours.

I received this timeless WU (249 points) from (171.64.122.120).
So I'm currently running two of those ...

[12:21:36]
[12:21:36] - Couldn't get size info for dyn file: work/wudata_01.dyn
[12:21:36] Starting from initial work packet
[12:21:36]
[12:21:36] Protein: p1112_L939_K12M_nat_min1_355K
[12:21:36] - Run: 45 (Clone 54, Gen 20)

Edit,
Stanford should improve the way WU's are assigned, if it can't report/get new WU's from a failing server, it should switch to another server after 3 failing attempts.
I mean if 10 % of all folding machines is stopped becasue of this, they will loose a lot of computer power.

Also some folders might be upset, but I will not be frustrated because ...

Live is too short for stress.

profdlp · May 2006

muddocktor wrote:

Right now server 112 is FUBAR, so nobody is getting any timeless wu's until Stanford gets off their ass and fixes the darn server, Esso.

Tell me about it.

I'm now up to EIGHT machines just twiddling their digital thumbs...

FoldingAddict · May 2006

Luckily I only have about 10 machines setup for timeless work units. And several of those are setup for batch work, so probably I'm not too hurt by this.

Esso is right though, just one server being down like this (a timeless server) is money to stanford because it takes 100s of Ghz of computers and makes them idle.

~FA

profdlp · May 2006

If you're desperate, (and I'm getting there!), you might give this a shot.

Cribbed from the Folding Community Forums:

Pette Broad wrote:

You should be able to get alternative timeless tinkers from server 120. If you keep getting sent to 112 then try this....Shutdown folding, go into config and untick the >5mb box. If you are using the console version then you'll have to use the -config flag. When you restart folding you should be sent to server 120. Why this happens I honestly have no idea but it worked for me.

Pete

For those of you using the console version, there is a quick guide to making changes on our main FAH page. Look for the item named Reconfiguring The FAH Console.

bikerboy · May 2006

i find this to be helpfull.

LINKY

bikerboy

dragonV8 · May 2006

profdlp wrote:

If you're desperate, (and I'm getting there!), you might give this a shot.

Cribbed from the Folding Community Forums:

For those of you using the console version, there is a quick guide to making changes on our main FAH page. Look for the item named Reconfiguring The FAH Console.

Though i am yet to thank Pette Broad for the advise, we changed about 4 comps using all our usual flags. At least we don't have 10 procs sitting idle now as it becomes extremely frustrating.

I'm sure it is not just an overload of WU's affecting server 112. It has been running quite well for a long time and the problem started about 3 or so days ago. As Mudd said, it's FUBAR.

Jon

profdlp · May 2006

I'm starting to wonder if there is a change in the works regarding Timeless WU's and Stanford is still sorting things out. Pure speculation, though.

csimon · May 2006

Good thread ...I'll have to do something with Icrontic2 this weekend cause I'm about out of work myself. I hope all this work gets turned in ok too.

Leonardo · May 2006

It works, gentlemen. I'm running the graphical client on my System 2. I changed "Deadlineless" to "Standard" in advanced configuration. It took much longer than usual to download a new work unit, but it download and started folding again. Nice unit too! It's a p2099 GbGromacs folding at 180PPD.

EDIT; nope, it's 200PPD now that no one is using the computer. Not too shabby for a GbGromacs on an old Athlon, is it!

Esso · May 2006

One of my folding units were stalled this morning, I woke up - couldn't sleep. Didn't know why ?
I guess that the folding was talking to me. You need to get up and fix this ... :tongue2:

I changed to standard as Leonardo did, and it's now folding p2505, 200 points.

Will report the ppd in this post later.
Now it's time for coffee, make that black, very black.

The server 171.64.122.112 has been down now for ~ 54 hours.

Edit,
p2505 is doing 15m47s to 15m57s / frame, depending on load ~ 183 ppd ( 366 ppd using two cores).

Time less projects executes at ~153 ppd (~ 306 ppd using two cores), so its even better .....

The P3-500 MHz needs 97m / frame running p2502, so one Opty-165 core is ~ 6 times faster.

profdlp · May 2006

It looks like the recalcitrant Tinker server is coming around. Three of my goobered comps are back in business. Now, on to the other five...

Problem getting new WU.

Comments