GPU Folding so far......

Krazeyivan · October 2006

Hi All

Just thought I would keep you up to date - early days - I am keeping the X1900XT at 2D clocks to start with. Cat 6.5 drivers and latest DirectX9.

Am running 1 GPU and 1 CPU - both cores are flat out - seems the first CPU is spending all its time sending data to the card.

I am still not sure if EM3 works with this yet, but I can tell you this that with project 2725 (run 0 Clone 248, Gen 0) I have 5% complete in just under 30 mins.

No idea on points yet, just keeping you up to date.

oh I nearly forgot - GPU temps via ATItool report 55C and 16.4A (this normally sits at 5.5A at 2D speeds)

QCH · October 2006

AWESOME news... keep us informed on this!!!

Sledgehammer70 · October 2006

Seems to use a bit of power to do the calc's

I bet power efficacy wise a CPU wins by a land slide.

FoldingAddict · October 2006

It's actually not awesome news. I've been reading over at OCF that on a core 2 duo based machine, it takes a whole core to feed the ATI card that is folding a unit. So you lose one folding core in your dual core, to feed the card, while the other core continues. The thing is, GPU folding so far is only worth about 450 points per day, and on a fast C2D machine, that actually produces a loss in points.

No doubt the project is better off with the GPU folding instead of the other core, but the current point levels provide no motivation to go out and purchase an X1900XTX. But it's still beta, who knows what's going to happen.

~FA

Garg · October 2006

I expect processing efficiency and point attribution to greatly increase over time, and I appreciate these early beta testers giving it a shot!

Krazeyivan · October 2006

Actual early testing by people (not me, but from what I read) run 2 CPU clients as services at 95%/low and a GPU NOT as a service at 100%/low and it runs OK.
Seems at this stage that the GPU needs about 30% of the CPU time, but with the CPU having to wait (not entirely sure what that is) this shows the CPU usage up at 100%

Also check out this link - note the GPU.........and number!

http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats

Krazeyivan · October 2006

Just a quick update - the log shows I get to 85% then it stops reporting - after a bit of detective work its actually fine - seems it refuses to tell you about the other 15% but it completes the rest of the unit. Stanford are aware of the issue too.

Sledgehammer70 · October 2006

Mac OS X - 4 - 7655
Linux - 20 - 16936
GPU - 15 - 208

Wow so saying 250 GPU's would push more Tflops than 16,936 CPU's under linux??? good god....

the_technocrat · October 2006

Sledgehammer70 wrote:

Mac OS X - 4 - 7655
Linux - 20 - 16936
GPU - 15 - 208

Wow so saying 250 GPU's would push more Tflops than 16,936 CPU's under linux??? good god....

how is that even possible? has to be a mistake.

Sledgehammer70 · October 2006

It is like a 60 times performance increase

the_technocrat · October 2006

Sledgehammer70 wrote:

It is like a 60 times performance increase

either:

- the GPU client has some serious optimizations
- GPU's have some huge architectural advantage for folding
- the GPU client is reporting incorrectly
- somebody's GPU is going to be a flame ball soon

I can't see how that much computing power can be contained in a GPU without serious heat issues...:wow2:

shwaip · October 2006

cpus are inefficient at floating point operations. gpus are built to do them massively parallel.

since that chart show TFLOpS (Floating point operations per second), it makes since that gpus have a much higher flops/processor ratio.

the_technocrat · October 2006

shwaip wrote:

cpus are inefficient at floating point operations. gpus are built to do them massively parallel.

since that chart show TFLOpS (Floating point operations per second), it makes since that gpus have a much higher flops/processor ratio.

In this case, 'massivlely' comes out to 71,428,571,428 flops per GPU. Them a lot of flops.

Leonardo · October 2006

The thing is, GPU folding so far is only worth about 450 points per day, and on a fast C2D machine, that actually produces a loss in points.

Not necessarily so. It all depends upon the work unit. The majority of work units are only worth about 150 points per day/per core. Sure, some will process at 600+ppd, but not the garden variety Gromacs.

the_technocrat · October 2006

I wonder what the newer physics cards could do...

shwaip · October 2006

and each gpu runs at around 625 mhz, which means that the gpu does ~110 floating point operations per cycle. That is probably higher than the true number, but the x1900xtx has 48 pixel shader processors and 8 vertex processors.

I don't know how efficient these are at floating point, or how they're being used, but the flops numbers probably aren't too inaccurate.

Enverex · October 2006

So it works out:

Mac: 1913 per TF
Windows: 1056.3 per TF
Linux: 847 per TF
GPU: 14 per TF

So what exactly does that mean? The GPU client is the most efficient followed by Linux, Windows, etc? Or does it mean the GPUs are most powerful, then the processors of people running Linux etc?

But there is one thing you're missing people, especially in statements like this "It is like a 60 times performance increase". The CPU guage is counting processors from as far back as anyone has reported workunits, so you're not comparing to the latest Core Duo or Athlon64 FX X2, this is also comparing to Pentium 90's and K6-2's all averaged out, so the insane performance increase may not be as phenominal as you think. In short you're comparing the average of all processors against the most powerful GPU ATi currently has.

Also I'm curious what this means: "*TFLOPS is actual flops from the software cores, not the peak values from CPU specs."

shwaip · October 2006

Enverex wrote:

Also I'm curious what this means: "*TFLOPS is actual flops from the software cores, not the peak values from CPU specs."

It means that they calculate this from the empirical data, rather than AMD's claim that their processor can do x flops.

Sledgehammer70 · October 2006

actiuually Enverex, I would be suprised is over 50,000 active CPU's in the wiondows based lineup are less than P3's maybe in the linup of 1,500,000 cppu's but my "60 times" was based on current active CPU's

Enverex · October 2006

Sledgehammer70 wrote:

actiuually Enverex, I would be suprised is over 50,000 active CPU's in the wiondows based lineup are less than P3's maybe in the linup of 1,500,000 cppu's but my "60 times" was based on current active CPU's

True but I'm sure there are still quite a few underperformers of different sorts in there dragging the average down a lot.

Isn't there any way to benchmark, er... 'terrafloppage' on a processor?

Leonardo · October 2006

If anyone here is not familiar with GPU Folding@Home, take a look at Anandtech's article informative article on the topic.

edcentric · October 2006

This makes me start thinking about Torrenza and the re-birth of co-processors.
The reason that GPUs can put up such numbers is strictly architecture. A CPU (C2D) may have 300M tansistors, but how much of that is tied up in 4MB of cache and other overhead functions. Remember the P4 has over 200M transistors and it couldn't do enough math to save its name.
In a GPU you have almost 400M transistors, the bulk of which are simply for crunching numbers.

Ultra-Nexus · October 2006

It seems the CPU cycles the GPU client takes are irrelevant, meaning that it doesnt matter if you have a c2d or a sempron... all the processing is still done at the GPU... this has to be confirmed though.

Krazeyivan · October 2006

Well since my 85% problem - its finished about 2 hours later without further messages - it has auto updated core 10 to version 0.06 - not sure yet what the difference is.
Oh the I have gone up in temp on the PWM 1c (41c) and the chipset has gone up 3c (43c) with running the GPU all the time.

Leonardo · October 2006

Just when I thought I was happy with budget and mid-range video cards.... This is just not the year for Folding-specific upgrades. Need to purchase another vehicle, fly to the East Coast, pay for my daughter's wedding next spring...save cash for possible repairs on my 150K mile Blazer, 190K mile Astro...

No I just can't justify purchasing an X1950....

Krazeyivan · October 2006

From my log file.............oh and Leonardo I agree you cannot not justify getting the X1950..........now the DirectX10 card, thats another matter!!

[18:54:08] *

*
[18:54:08] Folding@Home GPU Core - Beta
[18:54:08] Version 0.06 (Tue Oct 3 07:59:02 PDT 2006)
[18:54:08]
[18:54:08] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
[18:54:08] Build host: CYGWIN_NT-5.1 vishal-gpu 1.5.19(0.150/4/2) 2006-01-20 13:28 i686 Cygwin
[18:54:08] Preparing to commence simulation
[18:54:08] - Assembly optimizations manually forced on.
[18:54:08] - Not checking prior termination.
[18:54:08] - Expanded 83063 -> 443705 (decompressed 534.1 percent)
[18:54:08]
[18:54:08] Project: 2723 (Run 0, Clone 305, Gen 0)
[18:54:08]
[18:54:08] Assembly optimizations on if available.
[18:54:08] Entering M.D.
[18:54:19] Completed 0
[18:54:19] Starting GUI Server
[19:01:33] Completed 1
[19:08:47] Completed 2
[19:16:01] Completed 3
[19:23:14] Completed 4
[19:30:28] Completed 5
[19:37:42] Completed 6
[19:44:56] Completed 7
[19:52:10] Completed 8
[19:59:24] Completed 9
[20:06:37] Completed 10
[20:13:52] Completed 11
[20:21:07] Completed 12
[20:28:25] Completed 13
[20:35:42] Completed 14
[20:43:03] Completed 15
[20:50:23] Completed 16

the_technocrat · October 2006

wow, makes me think that we might redefine a 'budget box' as a barebones unit to support a fat GPU!

the_technocrat · October 2006

what kind of ppd are you pulling down with the GPU only?

Leonardo · October 2006

From my log file.............oh and Leonardo I agree you cannot not justify getting the X1950..........now the DirectX10 card, thats another matter!!

The only reason I would consider upgrading any of my video cards would be for MS Vista and/or Folding at home. But man, that's an expensive proposition if you don't give a whit about gaming. My computers all have excellent clarity, color reproduction, and 2D performance.

Krazeyivan, yes, please keep up us updated. This is a major event for Folding@Home and for the future of GPUs.

lemonlime · October 2006

Don't forget that there are much more affordable (and still very powerful) X1900 series cards. The X1900GT is supported by the F@H GPU GUI and is very reasonably priced. The X1900XT 256MB is also very affordable in comparison to the rather new X1950 series cards.

muddocktor · October 2006

The thing about gpu folding is that it is terrifically fast with the stuff that can be done with the gpu itself, but from what I understand is that it still needs a cpu core to process the part of the wu that can't be done by the gpu. So that occupies the core that is needed for processing the gpu wu somewhat and also slows the actual wu processing quite a bit too. So Stanford isn't actually seeing 20X more science being done by the wu.

And if you don't own a X1900 class vid card right now, don't go out and spend 4 big one's to get it right away. Like has already been said, presently the points return isn't worth the investment right now if you are primarily folding for the points and not the science. But this is a rough beta client and they do need the gpu's folding to iron out the bugs. Plus, I'm sure that there will be some kind of adjustment in points values in the future, as well as Stanford eventually letting lesser ATI vid cards be able to process work, such as the X1650 and X1800 series too.

And Leo, from all I've read on the next gen high end vid cards, the vid card will just be a part of the cost. Both Nvidia and ATI next gen vid cards look to be drawing some atrocious power; at least double to triple the cpu power draw. This will lead to ridiculous heat levels to have to deal with plus if they don't come with an external psu to drive them, make you have to upgrade to a $400-500 psu just to feed them.

GPU Folding so far......

Comments