Flash of weirdness - GPU used for processing?
primesuspect
Beepin n' BoopinDetroit, MI Icrontian
Sitting here thinking about computers as usual.
Thinking about the amazingly powerful CPU we have - even the lowest end computers are amazing machines if you step back and think about it.
Thinking about our GPUs - most people have high-end GPUs in there computers (at least on this site) - GeForce or Radeon parts.
Thinking about folding - how cool it is that it utilizes unused processor cycles.
Thinking about ... DING Would it be possible to utilize unused GPU PROCESSOR CYCLES for folding/other DC projects? How much would that increase output? How could it even be done?
Think about how underutilized our GPUs are during everyday tasks (i.e. everything EXCEPT games and rendering)
:confused2
Thinking about the amazingly powerful CPU we have - even the lowest end computers are amazing machines if you step back and think about it.
Thinking about our GPUs - most people have high-end GPUs in there computers (at least on this site) - GeForce or Radeon parts.
Thinking about folding - how cool it is that it utilizes unused processor cycles.
Thinking about ... DING Would it be possible to utilize unused GPU PROCESSOR CYCLES for folding/other DC projects? How much would that increase output? How could it even be done?
Think about how underutilized our GPUs are during everyday tasks (i.e. everything EXCEPT games and rendering)
:confused2
0
Comments
It would have the potential to increase a rigs folding power by at least 10-20%, perhaps Prime you should pitch the idea to Stanford.
I agree, it would be damn sweet.
An extremely simplified edition of the 3D Pipeline is here:
Step 1: Scene Database Management
Scene database management includes many application-level tasks (meaning they are done by the 3D application) such as knowing which objects should be in the scene and where they should be relative to other objects. The application is responsible for sending the necessary information about objects on the screen to the software driver for the GPU. Once the information has been sent to the GPU’s driver, it can be thought of as having entered the 3D graphics pipeline and will proceed through the following steps. The driver then sends the information to the graphics hardware itself.
Step 2: Higher Order Surface Tessellation
Most objects in a 3D scene are constructed of triangles because triangles are easy for GPUs to process. Some other polygon types, such as straight lines or quadrilaterals, can be used but triangles are the most common. Some objects are defined using curved lines. These curved lines can be very complex mathematically because they require high-order formulas to describe them. A high-order formula is one that has a variable that is raised to a power, such as x2. Examples of linear formulas would be y = x+1 or y = 2x+1. A similar example of a high-order formula would be y = x2+1. Objects that are defined by high-order surfaces must be broken down into triangles before they can be sent to the next functional unit in the GPU. For this reason, the Surface Engine in a GPU is the first hardware function. Its purpose is to break higher-order lines and surfaces down into triangles.
Step 3: Vertex Processing (Transform & Lighting)
Once an object is defined as a set of triangles (triangles are defined by specifying their vertices or the corners), the Vertex Shader function of the GPU is ready to do its job by applying custom transform and lighting operations.
Transform. As objects move through the 3D pipeline, they often need to be scaled, rotated or moved (translated) to make them easier to process or simply put them in the right place relative to other objects. The transform engine mathematically performs these scaling, rotation and translation chores using matrix multiplication.
Lighting. The lighting step is the calculation of lighting effects at each vertex of each triangle. This includes the color and brightness of each light in the scene and how it reacts with the color and specularity (glossiness) of the objects in the scene. These calculations are performed for every vertex in the 3D scene, so they are sometimes referred to as “vertex lighting.”
Step 4: Triangle Setup
Triangle setup involves taking vertices and triangles and breaking them down mathematically into pixels or fragments. Note that fragments can be pixels or can be smaller than pixels. The sole purpose of this function is to take data as it comes out of the transform and lighting engine and convert it mathematically
so the pixel shading engine can understand it.
Step 5: Pixel Shading and Rendering (including Texturing)
Pixel shading and rendering include all of the complex pixel-level calculations to determine what the final color of each pixel should be. Information from the transform and lighting engine is used to determine what the pixel color should be based on the object color and the various lights in the scene. Next, the pixel shading and rendering functions must consider the additional changes to the pixel color based on what textures should be applied. These textures can describe color changes, lighting changes, reflections from other objects in the scene, material properties, and lots of other changes. The final task of the pixel rendering engine is to store the pixel in the frame buffer memory.
Step 6: Output To Display
For the last stage in the 3D graphics pipeline, the display controller reads the information out of the frame buffer and sends it to the driver for the selected display (CRT, television display, flat-panel display, etc.).
What does this tell us? The only way that work-unit data could be processed by the VPU would be to find a way to convert the work-unit data to vertex information that the VPU could interpret and transform to something else. However, I don't think the VPU could do anything USEFUL with the data, at least in the way that the CPU can perform intensive looped mathematical calculations on it.
Short-Answer? IMHO, I don't think it can be done.
Storing something in remote RAM is not nearly as difficult as trying to get a foreign data type (ie Workunit data) to be processed by a piece of hardware that only knows how to manipulate vertex information.
Imagin a version with a subroutine that is run on the video card. The real trick would be balancing the workload between the CPU and GPU.
AGP was designed to eleviate 3 main problems.
1) Saturating the PCI bus with video data.
2) Slow main system RAM access.
3) Reducing the cost of video adapters.
When AGP first was deployed on the 440LX chipset for the Pentium II Klamath CPU's, it was supposed to usher in a new era in graphics, as video cards no longer needed to have their own VRAM included on the boards. When memory storage was required, the video card would simply utilize the main system RAM. With the advent of the AGP bus with a direct connection to the system MCH, this could be done, but not at a rate nearly fast enough to permit efficient graphics rendering. That's why we see video cards with so much VRAM installed on them today. Hence, AGP was already outdated by the time it was implemented.
Remember the Intel i740 graphics adapter? We all know how much that sucked, mainly because it relied on the use of system RAM instead of the use of dedicated VRAM mounted on the video card PCB.
At most, running today's Windows XP desktop @ 1280x1024 and 32-bit color uses a max of 30 MB (I'm being generous). There would be minimal performance degredation of using excess VRAM for data storage if the system had PC2100 DDR SDRAM or slower main system memory. With a 128 MB DDR accelerator in your system, why not put the other 98 MB of "extra RAM" sitting around to good use? For people who don't have the money to purchase more system RAM, but want a little extra free memory, why not use the extra RAM sitting on the video card?
Sounds plausible to me.
But then, its plausible to use the video card ram as it's in your machine anyway. so there's no need to buy extra ram because the ram already in your machine in the form of video ram, is being wasted.
so you're both right...
I'm confused :-S
~Cyrix
The RAM on my video card is SDR and I think it runs at like 140MHz or something close to that (RADEON 7200), so my DDR400 is faster. I used some overclocking utility way back when but I could only up the RAM a few MHz before I had artifacts on my desktop. But that's what I get for trying to save a few $.
"The kind of graphics applications include production grade video editing, massive monitor array architecture, "hot plugging" of graphics, and using graphics as a co-processor are on the cards, Cheng will say."
Quoted from this article, which is about the new PCI Express video bus standard.
It is like modern projectors, the only things you can talk to them about are their built-in menus with aremote to tune to what video they are getting in reality and tune for better colors.
You have to HAVE both I and O simultaneously to get a CPU to comm right.
Folding need not cost much, some of the assignment servers at Stanford for folding are PIII 733's. They run an interlocked STAR structure, probably, not a purer grid where all are equally capable peers except for a grid controller.
What a burden to bear.
What it took was the introduction of video cards with simple, identical, programable, massivly paralell, processor pipelines.
Well, what's next?
check your math, step 1
oh crap... THAT's why I cannot help my daughter with her math....
Seriously, that was prescient of you, especially considering you brought this up nearly four years ago, before Vijay Pande and ATI announced work on the GPU research.