Conflict of some sort?
DogSoldier
The heart of radical Amish country..
Hi, you seem like nice people...
I might be having a problem. I have 3 machines folding.. This one (Home comp) is fine.. it's the 2 at work that are doing strange things. I've been folding with my office comp for a week, no problems. But yesterday I installed the GUI on the company server and now both machines seem unable to finish a wu. Take a look at http://folding.extremeoverclocking.com/member_overview.php?UserID=60927 and note the "Last 24hr Production" table.
You'll notice some small point gains at 10/27/03 09:00 PM and 10/28/03 12:00 AM, Those are the office machines. Now look at:
http://folding.stanford.edu/cgi-bin/userpage?name=DogSoldier
It lists 2 active processors when it should show 3.
Now, I'm not a network guy, but since they are behind a firewall, could the Folding server be getting confused and sending what it thinks is one machine bad instructions? My first impulse is to disable -forceasm on the server ala mmonnin's post but I'm certain theres something deeper to this.
edit// Oh yeah, running all GUI
I might be having a problem. I have 3 machines folding.. This one (Home comp) is fine.. it's the 2 at work that are doing strange things. I've been folding with my office comp for a week, no problems. But yesterday I installed the GUI on the company server and now both machines seem unable to finish a wu. Take a look at http://folding.extremeoverclocking.com/member_overview.php?UserID=60927 and note the "Last 24hr Production" table.
You'll notice some small point gains at 10/27/03 09:00 PM and 10/28/03 12:00 AM, Those are the office machines. Now look at:
http://folding.stanford.edu/cgi-bin/userpage?name=DogSoldier
It lists 2 active processors when it should show 3.
Now, I'm not a network guy, but since they are behind a firewall, could the Folding server be getting confused and sending what it thinks is one machine bad instructions? My first impulse is to disable -forceasm on the server ala mmonnin's post but I'm certain theres something deeper to this.
edit// Oh yeah, running all GUI
0
Comments
a2jfreak, great idea.. actually, I'll run all machines using a2jfreak!
Machine ID difference in one ID results in a different unique machine hash for each box as well as a UID hash that is keyed to you as F@H UID;
The WUs are keyed to machine ID hash for tracking purposes, though the UID hash is normally sent to get credit to that UID;
So, you need one machine only on GUI, ideally (per UID), as the installer always sets Machine ID 1 and then the server setting up the account thinks a machine is being REINSTALLED if a second machine has the same machine ID number and UID (your user ID) and uses the GUI also.
Note the Machine ID hash might be used to also tune what kinds of WU are being sent, so you might get WUs that the first machine is good at being sent to another box that might not be able to grok or handle those real well and if it is under load (like a server IS, typically) the client and core might get swapped and pended and abend at random as load reaches peaks.
This results in the following-- seond box with same machine ID would be likely to abend WUs more than first box with ID. Server box, with priority on F@H at low, might be pending things in a way the Client or Core is not set up to handle. The F@H servers per se do not fold themselves, they are dedicated to handling resultrs and giving out new WUs and passing data to the stats servers and the web server gets data from them. Each single box might talk to about 8-9 servers, yes, but boht there are more servers doing just admin work for data and there are work servers to process and collate folding returns AND each box is unique in what WUs it handles.
So, I would do this-- look in the work folder\directory of the Folding@Home directory on your server and see if multiple work units are present in work. If so, one of three things are so-- either the Server is so busy that it is pending transmit to web of WUs and they are stacked up waiting to be sent, or client and core are abending and then starting with a new WU when they are unpended (told to come back to life), or the wrong WUs for your box are being sent. Note also that with F@H partial credit is possible, if your box interrupts F@H too much the WU will be abandoned by the client and when tendered you might get partial credit for a partially worked WU.
IF this is so, chjanging the priority on the GUI in the server might be good if your server has excess capacity to let the process be a higher priority process. There IS, in the GUI config, a priority that is optional that is used when other distributed computing processes are interfering with F@H, but this will take away from the server's main work to a degree as what it does it make the F@H process a more important to O\S process and then other things get pended more than they did with F@H at lowest priority(default) if a resource crunch occurs.
I use one of my boxes here is a Windows Workstation that works on high-floating point work and that box has a GUI is at higher priority simply beacuse it limped at lower priority. The other box does mostly surfing workstaion things and virus protection for email, and it is a Linux box. The Linux box is also a P4 box. IT runs at default priority.
In this case, not only could the dist center for F@H have confused servers due to machines with same Machine ID that hashs differenttly, but also the second box on GUI might be gettign WUs that take too many resources for the server to cope with given its other work. If you HAVE to run many GUI'd boxes and have problems with one of them, try to set each problem box to its own UID and let it use Machine ID 1 or edit the config files manually to another unused Machine ID for the UID it is hooked to than the other GUI'd boxes use. Machine ID does matter a lot.
So does randomly accessible comm that does not get pended a lot or preempted by ohter higher priority processes. Also, the non-GUI lets you use a choice of MAchine IDs (1-4) at intsall time and is easier to get a machine custom install out of when you have additional machines. If all boxes HAVE to have GUI, easiest is to hook them to multiple UIDs so they all get completely unique hash sets, and it is easiest to have one download set for each box, done from the box you want it on, as some of the machine ID data in the hash set is precalced based on communications rate and effectiveness and sent with the download and a base key for the downloading machine is calced at just before download time and stuck into the package sent AND that key is supposed to be machine-unique to begin with. When I downloaded from teh Linux a Windows GUI set and stuck into the Windows box, I got some strange things happening, and finally reloaded the GUI from a dialup connect download done by the client target box itself and things smoothed out a lot-- I can SEE the differences in effectiveness in minutes\frame of most of the WUs sent by downloading directly from the box that will be doing the work versus a TRANSFERRED BY FEET (burn to CD, walk over there, install) client download.
If there is a proxy server inline for both the work boxes, that might also account for the CPU count being off, but this looks like it might be something more than just that.
John.
And what tyoe of machines are these? Some specs please.
In other news:
These logs are interesting. If I'm reading them right.. http://www.planetfortress.com/tfa/stuff/Workstation_FAHlog.txt
My workstation stopped work on the unit at 8:53 pm (Is that EST or CST? And does this mean the machine rebooted or just the GUI?) then resumed the same Unit 8 minutes later. It then finished the unit at 7:00am but this doesn't jibe with my stats:
http://folding.extremeoverclocking.com/member_overview.php?UserID=60927
The server meanwhile started work on it's first unit minus a User ID. At 1:39am it stopped and restarted over 40 minutes later on the same unit. It also gained a new User ID.
http://www.planetfortress.com/tfa/stuff/Server_FAHlog.txt
I don't know where the wu points came from at 9pm and 12am last night, but the points that the workstation submitted at 7:00 am are nowhere to be found. (Escuse me, I seem to have lost my WUs!?!) And the server, my new addition is still working on it's first unit.
edit// It was 9pm and 12am, sorry.. The 3am points came from my home machine.
Yeah I've had 5 CPUs at one point, and only have 4. It's highly unlikely someone's going to jump on your name with your team number Dog, and fold for your name, so yeah, it's probably wrong
Either the computer rebooted for some reason or the client was stopped. Either way FAH stopped for a brief period on both machines.
Other than the clients being stopped it all looks ok to me.
Mines been folding 400's the last few times and they take over 24hrs to fold and I have my priority turned up and my load at 100%.
The first one's i'd folded were really easy and quickly folded and they were 500's but the last 3 have all been real number crunchers in terms of difficulty to do.
winFAH.exe" -advmethods -forceasm
More info on these flags here:
http://forum.folding-community.org/viewtopic.php?t=6057
edit// Actually, the 2 office machines are running tinkers. A 2500 and a 400... so.. the plot thickens!
It's taken all day to get 80 out of 400 done.....