PDA

View Full Version : 39 minutes a frame for P1140/1141


DanG
31 Aug 2005, 9:25pm
A bit slower than my A64 3200, but not too bad to be working on both an 1140 and 1141 at the same time!
So far, this dual core P4 2.8GHz is amazing. The only thing I didn't know, you can't turn HT on for it and run 4 instances, just 2. Oh well, I have HT turned off on the rest of my machines anyways. If only the driver people would get their junk in gear and get on with the 64bit drivers so I can use EMT64 as well.

profdlp
31 Aug 2005, 10:19pm
Jeepers, I'm getting 70 minutes per frame for those WU's on a couple of fairly fast 32-Bit rigs here. It's no wonder you're kicking some cans in the point standings. :respect: :fold:

mmonnin
1 Sep 2005, 1:45am
I didnt think some of the Dual core Pentium Ds had HT?

Trogan
1 Sep 2005, 2:14am
39 mins is fast :thumbsup:

It takes 45 mins per frame on those WU's for me

trilinium
8 Jan 2006, 12:23pm
I don't get it... I'm new to folding so don't beat me to a pulp for not knowing this, but isn't 39minutes like real slow? Mine goes at 1min and 1 second for each frame... and im using a 3GHZ P4.

Does it depend on the protein your folding? please enlighten

bikerboy
8 Jan 2006, 1:17pm
it all depends on what kind of wu(work unit) you get.

bikerboy

Winga
8 Jan 2006, 1:45pm
If my math is right I'm averaging 33 minutes a frame on my 3200.

[09:09:22] Project: 1141 (Run 62, Clone 1, Gen 29)
[09:09:22]
[09:09:22] Assembly optimizations on if available.
[09:09:22] Entering M.D.
[09:09:30] Protein: p1141_RIBO_FSpeptide_HEL_nospring
[09:09:30]
[09:09:30] Writing local files
[09:09:35] Extra SSE boost OK.
[09:09:36] Writing local files
[09:09:36] Completed 0 out of 250000 steps (0)
[09:44:31] Writing local files
[09:44:32] Completed 2500 out of 250000 steps (1)
[10:19:26] Writing local files
[10:19:26] Completed 5000 out of 250000 steps (2)
[10:54:22] Writing local files
[10:54:22] Completed 7500 out of 250000 steps (3)
[11:29:17] Writing local files
[11:29:17] Completed 10000 out of 250000 steps (4)
[12:04:12] Writing local files
[12:04:12] Completed 12500 out of 250000 steps (5)
[12:39:07] Writing local files
[12:39:07] Completed 15000 out of 250000 steps (6)

Would love to know what the latest 4000+ are pushing out.

muddocktor
8 Jan 2006, 2:01pm
Looking in EMIII on my DC Opti system here, it's averaging around 26 1/2 minutes per frame on the 1140/1141 series. That's with it running at 2400 MHz right now.

Straight_Man
8 Jan 2006, 2:51pm
39 mins is fast :thumbsup:

It takes 45 mins per frame on those WU's for me

I'm running 54-56 hours (33.3 min\frame about) per WU on the 1140 series depending on what I do with the Linux box while folding. 3.2 GHz, single folding instance.

dragonV8
8 Jan 2006, 3:57pm
I didnt think some of the Dual core Pentium Ds had HT?

The 840 D is a 3.2 dual core, no HT.
The 840 EE is a 3.2 dual core with HT.

Though they are of the same family, technically you are correct as it's not called a "D".:thumbsup:

Sledgehammer70
9 Jan 2006, 3:38pm
On my Dual Xeon Machine I get score like this on Gromac Cores....

[15:35:39] Writing local files
[15:35:39] GB activated
[15:35:39] Extra SSE boost OK.
[15:35:39] Writing local files
[15:35:39] Completed 0 out of 3750000 steps (0)
[16:05:26] Writing local files
[16:05:26] Completed 37500 out of 3750000 steps (1)
[16:35:09] Writing local files
[16:35:09] Completed 75000 out of 3750000 steps (2)
[17:04:31] Writing local files
[17:04:31] Completed 112500 out of 3750000 steps (3)
[17:33:48] Writing local files
[17:33:49] Completed 150000 out of 3750000 steps (4)
[18:03:33] Writing local files
[18:03:33] Completed 187500 out of 3750000 steps (5)
[18:32:59] Writing local files
[18:32:59] Completed 225000 out of 3750000 steps (6)
[19:02:43] Writing local files
[19:02:44] Completed 262500 out of 3750000 steps (7)
[19:32:45] Writing local files
[19:32:45] Completed 300000 out of 3750000 steps (8)
[20:02:22] Writing local files
[20:02:22] Completed 337500 out of 3750000 steps (9)
[20:32:12] Writing local files
[20:32:12] Completed 375000 out of 3750000 steps (10)
[21:02:09] Writing local files
[21:02:09] Completed 412500 out of 3750000 steps (11)
[21:31:49] Writing local files
[21:31:49] Completed 450000 out of 3750000 steps (12)
[22:00:59] Writing local files
[22:00:59] Completed 487500 out of 3750000 steps (13)
[22:30:09] Writing local files
[22:30:09] Completed 525000 out of 3750000 steps (14)

To Long!!!!

[08:26:15] Completed 3412500 out of 3750000 steps (91)
[08:51:16] Writing local files
[08:51:16] Completed 3450000 out of 3750000 steps (92)
[09:16:35] Writing local files
[09:16:35] Completed 3487500 out of 3750000 steps (93)
[09:41:53] Writing local files
[09:41:53] Completed 3525000 out of 3750000 steps (94)
[10:07:07] Writing local files
[10:07:07] Completed 3562500 out of 3750000 steps (95)
[10:32:07] Writing local files
[10:32:07] Completed 3600000 out of 3750000 steps (96)
[10:56:16] Writing local files
[10:56:16] Completed 3637500 out of 3750000 steps (97)
[11:20:49] Writing local files
[11:20:49] Completed 3675000 out of 3750000 steps (98)
[11:45:43] Writing local files
[11:45:43] Completed 3712500 out of 3750000 steps (99)
[12:10:44] Writing local files
[12:10:44] Completed 3750000 out of 3750000 steps (100)
[12:10:44] Writing final coordinates.
[12:10:44] Past main M.D. loop
[12:11:44]
[12:11:44] Finished Work Unit:

Leonardo
9 Jan 2006, 4:21pm
DanG, I see you've discoverd the Folding power of the D820. I too have adopted that CPU (http://www.short-media.com/forum/showthread.php?t=40521). You can get close to 800 points a day out of the dual Presshot. I assume you have two instances of Folding running. Set both clients with the flag -advmethods. Make sure also that the client configuration is set to "big packets - yes". On my Northwood HT computers (each two F@H clients) and my 820 I've been getting 450 point QMDs exclusively for two weeks. The point accumulation is amazing. Here are pics 820 cores (2.8GHz @ 3.64GHz) and all the HTs and 820 cranking out QMDs.

Leonardo
9 Jan 2006, 4:24pm
The 840 EE is a 3.2 dual core with HT.DragonV8 is correct. For the price of that processor though, you could have two or three D820s. Ebay has lots of the 820s, well, at least three weeks ago.

profdlp
9 Jan 2006, 6:12pm
...Make sure also that the client configuration is set to "big packets - yes". On my Northwood HT computers (each two F@H clients) and my 820 I've been getting 450 point QMDs exclusively for two weeks...
Is the bigpackets/QMD route viable for A64 rigs?

I've got to do something - Leo is cleaning my clock... :bawling:

DanG
9 Jan 2006, 6:12pm
DanG, I see you've discoverd the Folding power of the D820. I too have adopted that CPU (http://www.short-media.com/forum/showthread.php?t=40521). You can get close to 800 points a day out of the dual Presshot. I assume you have two instances of Folding running. Set both clients with the flag -advmethods. Make sure also that the client configuration is set to "big packets - yes". On my Northwood HT computers (each two F@H clients) and my 820 I've been getting 450 point QMDs exclusively for two weeks. The point accumulation is amazing. Here are pics 820 cores (2.8GHz @ 3.64GHz) and all the HTs and 820 cranking out QMDs.


Damn, I forgot about advmethods. How do I add the flag for that if I'm running as a service with 5.02?

csimon
9 Jan 2006, 6:18pm
Damn, I forgot about advmethods. How do I add the flag for that if I'm running as a service with 5.02?
If you're running as a service then probably the best way would be to use regedit ...have you edited the registry before?

All you really need to do is find the "-svcstart" and place a space and "-advmethods" directly behind it. Just be sure to put it after each find where it doesn't already exist.

Leonardo
9 Jan 2006, 6:25pm
Sorry, don't know how to configure as a service. I've added the -advmethods flags under target in the Folding startup shortcuts.

Is the bigpackets/QMD route viable for A64 rigs? I have no clue on this, Prof. Do the AMD64s have SSE3, or the AMD equivalent? I think that's where the efficiency is to be found. SSE2 works well also, of course, but I think SS3 is even more efficient.

profdlp
9 Jan 2006, 6:31pm
Do the AMD64s have SSE3, or the AMD equivalent?...
SSE3: No
3DNow+: Yes

Might be time for the ol' Prof to do a little experimentation. :cool:

Leonardo
9 Jan 2006, 6:37pm
3DNow+ is the approximation of SSE2, right?

DanG
9 Jan 2006, 7:00pm
If you're running as a service then probably the best way would be to use regedit ...have you edited the registry before?

All you really need to do is find the "-svcstart" and place a space and "-advmethods" directly behind it. Just be sure to put it after each find where it doesn't already exist.


Done and done.

Thanks.

Is advmethods only really beneficial to dual core P4's, or the entire P4 line and xeons as well?

profdlp
9 Jan 2006, 7:19pm
3DNow+ is the approximation of SSE2, right?
You know, I was going by wcpuid which says SSE3 is not supported. I then noticed that it also thinks my Socket 939 MB is a Socket 754... :-/

Further research indicates that they are wrong about SSE3, too.

I'm going to give it a shot. :)

Leonardo
9 Jan 2006, 7:29pm
Is advmethods only really beneficial to dual core P4's, or the entire P4 line and xeons as well? Advmethods is highly beneficial to to all P4s and PDs with with instruction sets SSE2 and higher. I'm not sure about SSE prime (non-'2' or '3'). One caveat, of you are not running at least 1GB of RAM, do not have more than one Folding instance running with large work unit permissions enabled. Two QMDs or two Double Gromacs processing simultaneously require significant hardware memory resources.

GrayFox
9 Jan 2006, 7:37pm
Sorry, don't know how to configure as a service. I've added the -advmethods flags under target in the Folding startup shortcuts.

I have no clue on this, Prof. Do the AMD64s have SSE3, or the AMD equivalent? I think that's where the efficiency is to be found. SSE2 works well also, of course, but I think SS3 is even more efficient.


The modren athlon 64's have SSE3 (Anything newer then the newcastle).

Leonardo
9 Jan 2006, 8:36pm
That's good! If prices come down, I'd like to eventually migrate/upgrade at least three of my systems to dual core AMD.

profdlp
9 Jan 2006, 10:28pm
The modren athlon 64's have SSE3 (Anything newer then the newcastle).
Are you sure about the Newcastle part? If not, I'll be :banghead: :banghead: :banghead:

muddocktor
9 Jan 2006, 10:37pm
Prof, the Winchester A64's do have SSE2 but no SSE3 instruction sets. The newer A64 procs such as Venice, San Diego, Manchester, Toledo and so on, have both SSE2 and SSE3 instruction sets. But an AMD won't get assigned a QMD wu regardless of instruction sets on the proc, due to licensing issues Stanford has with the math libraries used for the QMD core. However, an A64 will fold them just fine without using SSE2 and your A64 will show a production increase over the regular Gromacs and Tinker work being handed out. But you have to download a QMD on a P4 rig, then move it to the A64 machine, which is a little bit of a PITA.

Donut
9 Jan 2006, 10:58pm
Done and done.

Thanks.

Is advmethods only really beneficial to dual core P4's, or the entire P4 line and xeons as well?

The P4 and Xeon line as well.

Actually when I had some XP's still running, they had the advmethods flag also.
It used to be the only way to get the 600pt. gromacs, I don't know if that is still valid or not.

Folding flag descriptions.http://folding.stanford.edu/console-userguide.html

Donut
9 Jan 2006, 11:16pm
muddocktor,

I tried it once but like you said it was a pain. (sneakernetting QMD's) What do you think about QMD on a pr. of 242's (1.6 gig). I honestly don't remember what the production was and right now they are set to run tinkers.

I have to try to close the gap with Prof somehow.

profdlp
9 Jan 2006, 11:17pm
...you have to download a QMD on a P4 rig, then move it to the A64 machine, which is a little bit of a PITA.
Especially when you don't have a P4 to begin with. :(

Thanks, Mudd. :)

muddocktor
10 Jan 2006, 3:24am
Donut, a QMD would probably average better points than a Tinker, even at 1.6 GHz. However, I tried running 2 QMD's on my X2 4400 at the same time and I saw much the same kind of points dropoff that the Xeon systems do with a pair of QMD's, with 1 client averaging around 330 points/day and the other at only 210 points/day. These QMD's are so bandwidth intensive that a pair of them will flood all the available memory bandwidth you have with a dual proc or dual core machine. The best (and most productive) mix is 1 QMD on a client and a Dgro WU on the other client, with my DC Opti system averaging around 700 points/day with that. If you can't get a DGro, then put a Tinker on the 2nd client and you should see good production out of that Opti system.

Actually, sneakernetting the QMD's isn't too bad, since they are so big. You will only have to reload the client every 2 days or so.

Leonardo
10 Jan 2006, 4:41am
Mudd, what's the L2 cache size on the X2 4400's cores? I don't see any drop off with two QMDs simultaneously running on the 820 (each core 1MB L2). Is the "quad" FSB on the Prescotts better for this type of application?

muddocktor
10 Jan 2006, 4:53am
The L2 cache is 2 X 1MB on the 4400, just like the DC Opterons. I think it's mainly a memory bandwidth issue though; I have to take a system down to get my other stick of VX 4000 out of it for the X2 system as right now I have 1 stick of value VX and the other stick is a VX4000 and the value VX is presently limiting it on ram speed at 2-2-2-8 timing.

Donut
10 Jan 2006, 10:18am
Thanks muddocktor. I'll give it a try.