Good NEWS for Team Short-Media!!!

Straight_Man · January 2004

t1rhino wrote:

Your Points\Day should read Total Points and WU\Day should read Points\Day

Um, total points for team is in the millions of points for all these teams. They use total points to mean points since day one. Hang on, will show you....
<table border="1" bgcolor="CCFCFC"><tr><th colspan="4">Total accumulated Points and WUs, as as of 5:00 PM, Jan 14, 2004</th></tr><tr><td>Team Rank</td><td>Team Name</td><td>Total WU</td><td>Total Points</td></tr>
<td>8</td><td> Ars Technica Team Egg Roll</td><td>395,894</td><td>7,995,677.21</td></tr><tr><td>9</td><td>Team Short-Media</td><td>411,314</td><td>7,265,200.92</td></tr><tr><td>10</td><td>Amdmb.com Folding Team</td><td>370,038</td><td>6,910,915.69</td></tr></table>

I skipped absolute total figures totally in grabbing stats, made it simpler. I will check, but, the totals are climbing so slowly one against the other in re team competitive stats, that I used just the per week and per day stats for now. As things get real heavy, will pull in cumulative overall totals.

The figures SHOULD, from this point on, be "Total Points\Week" and "Total Points\DAY" in my normal daily table, unless you want accumulated totals also..... Can do this....

Statsman, the first day I posted, did NOT have the same team stats table up, he chose today to show a beta team stats table for the first time I saw it.... My bad, with second day's column labels.... FIXED.

John.

Straight_Man · January 2004

Here is a hyperlink to an Aarachnid custom graph that might explain why I did this thread in the first place:

http://stats.zerothelement.com/cgi-bin/folding3/cd-graph-vs-team-custom.pl?teams=14&teams=93&teams=734&data=LastWeekData1&time=3&points=16&title=Arachnid+Stats+-+Custom+Graph&xsize=600&ysize=400&scale=AUTO&maxScale=.1&maxAbs=&minScale=.1&minAbs=&type=LINE&transparency=0&dimension=1&lineWidth=2&submit=Create+My+Graph

Straight_Man · January 2004

<table bgcolor="CCFCFC" border="1">
<tr><th colspan="9">Stats for January 15, 2004</th></tr>
<tr><td>Team#</td><td>Total Members</td><td>Active Members</td><td>%Active</td><td>Team Name</td><td>Total WU</td><td>Total Score</td><td>Total Points\Week</td><td>Total Points\Day</td></tr>
<tr><td>8</td><td>1,809 </td><td>372</td><td>20.6</td><td>Ars Technica Team Egg Roll</td><td>396,482</td><td>8,018,352.03</td><td>215,005.73</td><td>30,750.83</td></tr>
<tr><td>9</td><td>977</td><td>223</td><td>22.8</td><td>Team Short-Media</td><td>411,891</td><td>7,286,396.49</td><td>184,958.07</td><td>28,008.15</td></tr>
<tr><td>10</td><td>928</td><td>258</td><td>27.8</td><td>Amdmb.com Folding
Team</td><td>370,567</td><td>6,930,838.46</td><td>183,408.74</td><td>26,493.51</td></tr>
</table>

ON!!!

John.

Al_Capown · January 2004

Woot woot! GJ GUYS!

pseudonym · January 2004

Must get other computer up and running......

greendragon · January 2004

Hey everyone

Just stopped over from amdmb to see what you guys were up to. I was checking team stats earlier, looks like you guys are gonna make it tough on us. Keep up the great work!

Fold On!!!

GD

mmonnin · January 2004

Thanks for visiting greendragon. Its always a battle between you guys and team 93.

Straight_Man · January 2004

<table bgcolor="CCFCFC" border="1">
<tr><th colspan="9">Stats for January , 2004</th></tr>
<tr><td>Team#</td><td>Total Members</td><td>Active Members</td><td>%Active</td><td>Team Name</td><td>Total WU</td><td>Total Score</td><td>Total Points\Week</td><td>Total Points\Day</td></tr>
<tr><td>8</td><td>1,811</td><td>375</td><td>20.7</td><td>Ars Technica Team Egg Roll</td><td>397,312</td><td>8,051,026.23</td><td>219,521.51</td><td>32,674.20</td></tr>
<tr><td>9</td><td>977</td><td>224</td><td>22.9</td><td>Team Short-Media</td><td>412,685</td><td>7,315,984.45</td><td>190,260.88</td><td>29,587.96</td></tr>
<tr><td>10</td><td>928</td><td>254</td><td>27.4</td><td>Amdmb.com Folding
Team</td><td>371,302</td><td>6,958,106.70</td><td>184,332.62</td><td>27,268.24</td></tr>
</table>

Indeed we have good competition going, greendragon!!!

ON, GANG!!

John.

Leonardo · January 2004

Good work! I'm trying to get my production higher as well. I've added my business/travel laptop to the fray, but unfortunately, it's 1300MHz Intel Centrino. Overall, it's a really nice laptop, just a real loser for Folding. I started the first work unit, a p638 Tinker Monday. Today is Friday; and after approx. 14 hours/day folding, this anemic wonder still hasn't completed the 400 step work unit! I think I'll have to cut my losses on this laptop for Folding. Every 20 seconds the heatsink fan goes in to overdrive - nerve wracking!

GHoosdum · January 2004

I had to take my laptop off of folding for similar reasons. But I'm trying to up my desktop PC production to make up for it.

primesuspect · January 2004

Weird - my centrino laptop , at 1300 mhz, is a real winner. Of course on battery it sucks because it drops to 600mhz, but when plugged in, this thing chews 'em up.

csimon · January 2004

Leonardo wrote:

Good work! I'm trying to get my production higher as well. I've added my business/travel laptop to the fray, but unfortunately, it's 1300MHz Intel Centrino. Overall, it's a really nice laptop, just a real loser for Folding. I started the first work unit, a p638 Tinker Monday. Today is Friday; and after approx. 14 hours/day folding, this anemic wonder still hasn't completed the 400 step work unit! I think I'll have to cut my losses on this laptop for Folding. Every 20 seconds the heatsink fan goes in to overdrive - nerve wracking!

are you using the -advmethods flag and getting stinkers anyway?
I bet those gromacs would go just fine on that centrino.

Leonardo · January 2004

Go figure! I've no explanation. Perhaps yours is better cooled. This is a Gateway something-er-another -- 20 to 24 minutes per frame on the P638 Tinker.

I just checked the FAHLog. Is SSE supposed to be engaged for Tinkers? I don't remember.

Hmm, maybe this a clock throttling issue.

Leonardo · January 2004

No, I've added no flags.

Straight_Man · January 2004

Gromacs do work fine, tinkers run 16 hours on my P4 before completion, typically. P4s hate Tinkers, with or without -advmethods and Core_65 does not take advantage of SSE2 much less P4 SSE emulation. Part is core, part is that P4 boxes should not be getting Tinkers. I killed a tinker that had been running 8 hours, was only 43% of the way through, and discovered BARTONS with the new client do not like tinkers either. Box got anotherr tinker. At that point I deleted the Core_65, the queue.dat, the whole CONTENTs of the work folder, and have been getting Gromacs since then.

Leonardo, Folding runs CPU at 96-100% capacity, at full speed. You need better cooling or a basement that is cold for the laptop to live in while folding, sorry.... OR a raiser shelf so it stays cooler if the laptop has bottom vents.... Ojne of the wire mesh shelves that are used for big plates in cupboard would do for home, no idea about work....

John.

csimon · January 2004

Leonardo wrote:

No, I've added no flags.

which client are you using 4.0, 3.25, 3.24 or other?
set the -advmethods flag to request gromacs. Honestly I think that Tinkers are optimized for AMD's and Gromacs for Intel's ...you'll notice a world of difference.
If using the FAH4Console or gui version of FAH4 that should be the only flag you need.

Straight_Man · January 2004

csimon wrote:

which client are you using 4.0, 3.25, 3.24 or other?
set the -advmethods flag to request gromacs. Honestly I think that Tinkers are optimized for AMD's and Gromacs for Intel's ...you'll notice a world of difference.
If using the FAH4Console or gui that should be the only flag you need.

I also use forcesse for P4, get about 1.2 TIMES the work effectiveness.... There was a HINT of this in threads on folding-Community, someone said that forcesse actually triggered SSE2 on P4's and 3Dnow on older AMD CPUs. Both my Barton and my Northwood like -forcesse. VERY well. Problem with tinkers is core, in part, and fact that tinkers were not tuned for reasl modern gen processors. OLDER client did Tinkers faster than new client does, same CPUs. Tuning strategy tradeoff.

Oh, one more interesting thing, forceasm only appears to WORK on AMd boxes. when advmethods is triggered on my P4 box F@H client run, client tells me assembly opts have been forced, nothing happens when I also use -forceasm. BUT, on Barton box, -forceasm improves output and -advmethods get Gromacs.... Interesting and passingly strange given names....

John.

csimon · January 2004

idunno ...I could have sworn that Vijay said that sse2 is not used at all but I could be mistaken ...it would sure be nice though.

Leonardo · January 2004

This Gateway is running version 4.0. I've added -forcesse and -advmethods. We'll see what cooks. Ageek, your point about cooling is appropriate. I do have the laptop propped up a little from the work surface. I'll lift it up a bit more and see if that changes anything.

Straight_Man · January 2004

The P4s emulate SSE like ****. They actually do about 2\3 rate with pure old SSE (which is what the core_65 forces them to do, it uses 3DNow and P4s have to drop to SSE to emulate) versus SSE2. I do not think the Client 4.01.00 or new release core_78 purely is calling for SSE2, but that is what is happening-- the P4s are handling processing mostly using code in SSE2. P4s can do that with work that does not use specific and direct SSE or 3Dnow (which gets handled as SSE) throughout, and can reroute internally to SSE2 functions where more optimum.

From looking at logs and intelligently looking at stats and comments, what happens is this:

-forceasm turns AMD(Barton and up I know, suspect possibly also some of the newer Athlon pre-Bartons but probably not the Durons except for the VERY latest ones) style assembly optimizations on, and -advmethds also does something-- but on a Barton all it does is cause a fetch predisposition to get newer Beta Core_78 and newer beta WUs. Barttons, would say to use all three switches.

On the P4s, the -forceasm does nothing at all and the -advmethods gets faster WUs and tells the P4s (not Willamettes, but Northwoods and up) to tune the SSE internally. P4s later than Willamettes, use the two mentioned above, as the -forceasm does nothing and assembly optimization turns on anyway without the -forceasm on these advanced P4s. P4 is weird beast, if it runs old SSE code, it runs slower code through its deeper pipes and changes cache strategy. If it gets 32 bit compiled code, it tunes it on the fly. That is core of assembly optimnization in a Northwood and up. Prescott will want SSE2 to run right, feed it SSE direct call vector code, it will be real slow. Old games will stink on a P4 of Northwood and up, Barton will emulate better.

Now, the folding assignment servers know what to serve based in part on machine specific data tied to the machine specifc UID hash, and it would appear that the machine type is in this hash now with new client onboard. I did this-- ReConfigured the Barton to the machine ID the P4 had, by machine number. Then, reconfigured the P4 to the Barton's ID. And hacked the machine type and switched those. NEXT WU after that, which was assigned, the Barton got the same WU that the P4 had last, AND vice versa. 48 hours later, the Barton and the P4 were getting WUs back to the older pattern I could see from mining logs for WU specific project info. AND the machine types were changed back. Replicated 3X. Lesson, if you switch around machine IDs, you may get WUs not subtuned for your box, and changing machine type is a definite no-no.

FROM what is happening, the bench timing is used for a rough guess as to how much vector in WUs to push, by project ID and clone. I THINK, in the betas, folding is implementing a 32 bit assembler\compiler. P4s will optimize this natively if given the chance-- with an on-the-fly optimizing based on cache contents. Bartons will if told to. OTOH, with older code, Bartons will try to optimize, and P4s will emulate. I KNOW pure 16 bit code gets emulated in P4s. If you like old classic games, get an AMD. If you like things written for SSE2 or full-fledged SSE, get a P4. I do not think that the Willamettes do as well with SSE-->SSE2 optimizing as Northwoods and up, partly because their anticipation guesses are one heck of a lot worse, and that in part is because they are using smaller chunks of code to use as an anticipation guess base. IF folding were to have implemented full SSE2, which they have NOT yet done, Northwoods and up would massively rock but many boxes that fold now would BOG, especially those who have older boxes and CPUs, partly becasue this is best done with 64 bit compilers which are not fully mature yet. The short machine bench now used does not pick out the full difference between the P4 and Barton and up tuning strategies, but the assignment servers appear to be getting feedback on turnin that lets them tune assingments unless all the "best" servers for a box are busy. The P4 loves project 1000 series, and hates project 900 series. The Barton likes the opposite-- and the -forceasm switch.

John.

John.

Leonardo · January 2004

WOW! Added -forcesse and -admethods; elevated laptop just a little more. No kidding - frame time dropped from 20 to 24 minutes to 11 minutes! Holy cow! Don't know which change to attribute for the performance increase. Maybe it's a combination of the forced services and better cooling. The bottom of the computer is niticeably cooler to the touch now.

Leonardo · January 2004

OK, let me apologize for hijacking this thread. Its purpose was to show that our momentum is rising, and that we definitely are not losing ground.

Yippee!

Leonardo · January 2004

Yessss! As I write this, my laptop is sending its first completed work unit to the Stanford server! :celebrate

mmonnin · January 2004

The tinker core , core_65 uses ABSOLETELY NO optimizations. I dont know where you got that from Ageek but tinkers are the FPU ONLY. Thats why AMDs smoke P4s so bad on tinkers because they have a lot higher FPU.

Straight_Man · January 2004

mmonnin wrote:

The tinker core , core_65 uses ABSOLETELY NO optimizations. I dont know where you got that from Ageek but tinkers are the FPU ONLY. Thats why AMDs smoke P4s so bad on tinkers because they have a lot higher FPU.

In P4s, SSE2 and FPU tuning strategies results interrelate-- even how FPU can be dynamically piped differently through whole CPU die relates to performance at gross look levels. That is part of why Intel's stink with tinkers, in P4 gen. And that is how what you said relates to what I said, the P4s use this interdependently and the AMDs less so. Older Pentia and Older AMDs will be more percentage effective performing in plain FPU WUs with 16 bit encoding base on non Intel compilers, but P4s internally try to tune for the pipe and FPU processing structures they have in them. That can include code conversion. SSE per se came out in very late 80's for first implementation attempts, SSE2 in late 90's for full implementation of second gen SSE. Yes, is FPU handling sans SSE tuning in code that makes part of difference, and SSE to SSE2 internal autoconversion in P4s that lets them do with older gromacs what they do. FPU is used for vector calcs, ALU for bit calcs that are usually integer based calcs. Thus, I say vector tuned as meaning used with FPU part of CPU, and with P4s SSE2 is used partly to determine how the floating point or vector calcs are actually done in CPU. SSE is deemphasized by design, and the CPUS like greater FPU WUs.

Thus, you feed old code to a newer P4, you get very bad aniticipation, very bad autooptimization, and VERY POOR performance. In part because of how CPU was designed to work.

John.

Straight_Man · January 2004

<table bgcolor="CCFCFC" border="1">
<tr><th colspan="9">Stats for January 16, 2004</th></tr>
<tr><td>Team#</td><td>Total Members</td><td>Active Members</td><td>%Active</td><td>Team Name</td><td>Total WU</td><td>Total Score</td><td>Total Points\Week</td><td>Total Points\Day</td></tr>
<tr><td>8</td><td>1,811</td><td>375</td><td>20.7</td><td>Ars Technica Team Egg Roll</td><td>397,312</td><td>8,051,026.23</td><td>219,521.51</td><td>32,674.20</td></tr>
<tr><td>9</td><td>977</td><td>224</td><td>22.9</td><td>Team Short-Media</td><td>412,685</td><td>7,315,984.45</td><td>190,260.88</td><td>29,587.96</td></tr>
<tr><td>10</td><td>928</td><td>254</td><td>27.4</td><td>Amdmb.com Folding
Team</td><td>371,302</td><td>6,958,106.70</td><td>184,332.62</td><td>27,268.24</td></tr>
</table>

primesuspect · January 2004

Seeing the stats put up like that makes me realize how close we are to once again being able to overtake Team Eggroll.

I've been craving chinese food.......

profdlp · January 2004

primesuspect wrote:

Seeing the stats put up like that makes me realize how close we are to once again being able to overtake Team Eggroll.

I've been craving chinese food.......

About a 10% increase over our current rate would get us gaining on them.

C'mon gang - we can all find an extra 10% somewhere. Recruit a friend, tweak your box, build that new rig.

mmonnin · January 2004

Working on getting another system going.

muddocktor · January 2004

Ageek wrote:

In P4s, SSE2 and FPU tuning strategies results interrelate-- even how FPU can be dynamically piped differently through whole CPU die relates to performance at gross look levels. That is part of why Intel's stink with tinkers, in P4 gen. And that is how what you said relates to what I said, the P4s use this interdependently and the AMDs less so. Older Pentia and Older AMDs will be more percentage effective performing in plain FPU WUs with 16 bit encoding base on non Intel compilers, but P4s internally try to tune for the pipe and FPU processing structures they have in them. That can include code conversion. SSE per se came out in very late 80's for first implementation attempts, SSE2 in late 90's for full implementation of second gen SSE. Yes, is FPU handling sans SSE tuning in code that makes part of difference, and SSE to SSE2 internal autoconversion in P4s that lets them do with older gromacs what they do. FPU is used for vector calcs, ALU for bit calcs that are usually integer based calcs. Thus, I say vector tuned as meaning used with FPU part of CPU, and with P4s SSE2 is used partly to determine how the floating point or vector calcs are actually done in CPU. SSE is deemphasized by design, and the CPUS like greater FPU WUs.

Thus, you feed old code to a newer P4, you get very bad aniticipation, very bad autooptimization, and VERY POOR performance. In part because of how CPU was designed to work.

John.

John, that is just a bunch of babble, man and I'm throwing the :bs: flag on you! Tinker WU's do not use any assembly loop optimizations!!! They do not use SSE, 3DNow! or SSE2 and that is the reason why Stanford has been shifting away from the Tinker core. The Gromacs core does use SSE, but not SSE2, according to Vijay over at the folding community(do a search over there on SSE2 and you will probably turn this statement up by Vijay). The reason that P3 and especially the P4 procs stink so bad on Tinker WU's is that their native FPU unit isn't nearly as efficient as the Athlon's FPU unit. Once they get work that uses SSE optimizations though, their production comes up to parity with an AMD XP proc, but they have to be able to utilize SSE for them to do so.

I've been involved with the Folding@Home project for over 2 years now and have been fairly active both here, the community forums and also at overclockers.com's forums, where I first started folding. Please don't go posting gobbledegook like this when you don't know WTF you are talking about.

Good NEWS for Team Short-Media!!!

Comments