If geeks love it, we’re on it

AMD in 2010, part 3: GPU compute

AMD in 2010, part 3: GPU compute

Following our look at AMD’s upcoming platform and processor plans, this third installment of a four-part series which digests AMD’s 2009 Financial Analyst Day takes a look the company’s efforts with GPU computing.

For years, the industry has worked to realize the GPGPU, or a video card which handles tasks traditionally assigned exclusively to the processor. The process of developing the hardware and software to serve this goal has taken some time, but we are finally at the point where the framework is set. The advent of DirectX 11, compatible hardware, and related APIs has created a standardized way in which developers may write programs to take advantage of a GPU’s significant potential.

We have just begun to unlock the combined power of the CPU and GPU.

We have just begun to unlock the combined power of the CPU and GPU.

Going forward, future GPUs will be designed each according to their manufacturer’s philosophy and goals. That is to say, while all GPUs will share enough similarities to execute programs written to the standard model, manufacturers will architecturally optimize and emphasize certain tasks over others. This is no different from modern desktop processors which standardize what can be executed around x86, while leaving how code is executed to chip designers and their performance goals.

AMD, for its part, has outlined a performance goal centered on developing the nascent consumer market (though that is not explicitly true). Their many initiatives to this end are collectively known as “heterogeneous computing,” a term which implies disparate execution engines—the GPU and processor—working in concert to optimally handle a given workload. Put another way, AMD envisions a future wherein a program’s routines are always and fully sent to the best hardware for the job. This model makes particular sense for AMD, as it stands apart from NVIDIA and Intel with a technology portfolio that consists of both performance GPUs and competitive processors.

Heterogeneous computing

Your average GPU is a specialized hunk of silicon optimized to crunch high precision mathematics known as floating point numbers. Without diving into a face melting exposé on what that is, it’s easier just to remember some of the applications that rely upon it, such as: Video processing (encoding, upscaling, playback), cryptography, hardware physics, 3D rendering and audio processing.

AMD_FAD_gpu_compute_apps

The number of apps and vendors supporting GPU acceleration is growing.

The common thread between all of these applications is that they’re all a high-level embodiment of complex mathematical formulas, and modern GPUs can run through them more than eight times faster than today’s fastest processors. That’s a lot of reserve potential, but uncorking it in a way that’s transparent and meaningful to consumers will require the further development of three key technologies over the next three years.

Programmable GPUs

GPUs of yesterday had a “fixed function” pipeline, which means that they were designed with a specific and unchanging set of capabilities. In other words, if a developer wanted to execute a graphical effect or computational process, it could not be performed unless the GPU specifically supported what the developer was attempting. It simply wasn’t possible to invent and then implement new special effects, or new ways to use the GPU. Thankfully, today’s GPUs are horses of a different color.

Modern video cards have “programmable pipelines,” or GPUs that abandon fixed functions in favor of customizable hardware known as shaders. Shaders provide an interface on which developers can run arbitrary code, and that makes the contemporary GPU very much like a processor. As an example, NVIDIA’s upcoming Fermi chip can natively execute C++ code, the same language used on many of the CPU-centric applications you’ve ever used.

Moving towards "you can dream it, you can do it" programmability. We're not there yet.

The industry is slowly moving towards direct architectural communication.

As we’ve intimated, however, the era of the programmable GPU is just beginning. The state of the industry currently resides at the “OpenCL/DX driver-based programs” level in the above diagram. At this level, routines which attempt leverage GPU resources—like Folding@Home and DirectX video acceleration—must have driver-level support. This model is sufficiently robust for methods that are supported by the driver, but further cultivation of APIs can push us towards system-level programmability, which will enable developers to directly communicate with and control the hardware. Speaking directly to the GPU’s architecture will unlock a new level of optimized performance and finally place the GPU on the same level with the CPU in terms of arbitrary code execution.

APIs

An API, or application programming interface, is a framework that describes and assists the developer in communicating with hardware or software, without specifying or restricting what can be communicated. Think of it like the user interface in Windows: You understand scroll wheels, text boxes, buttons and prompts, all of which abstract the core functions of Windows into an understandable presentation.

In that regard, the industry has come into its own with a pair of APIs which abstract GPU hardware in a meaningful way: OpenCL and DirectCompute. OpenCL, for its part, is overseen by the Khronos Group, which counts firms like NVIDIA and Apple amongst its members. DirectCompute, in turn, is a component of Microsoft’s DirectX 11.

AMD_FAD_heterogeneous_computing

OpenCL and DirectCompute enable developers to write applications that are accelerated by AMD and NVIDIA hardware without making any or substantial changes to the body of code. And though the two APIs have similar goals, they’re finding very different outlets: DirectCompute is being used almost exclusively for games, while OpenCL is at home in non-game applications.

Aside from the programmability levels we described in the previous section, compute APIs are currently hampered by plain and simple unfamiliarity. Developers have not yet discovered all the best practices for squeezing the most out of a GPU, but that will come with time and the additional tools outlined in the above image.

Fusion

Perhaps the most compelling stage of AMD’s push to unite a PC’s disparate execution resources is the firm’s Fusion initiative, which plans to pack GPU and CPU hardware into processors collectively known as APUs, or Accelerated Processing Units. Though AMD has hit a series of stumbles which have delayed 2008/2009 products until 2010/2011, the company’s patent portfolio of performance GPUs and competitive CPUs uniquely positions them to dominate the performance and the price/performance metrics when the time comes. This is especially true now that Intel has indefinitely delayed its own competitive GPU and NVIDIA offers no CPU.

AMD_FAD_fusion_design

AMD_FAD_fusion_hardware_unity

The first of many such chips to follow this model is currently known as Llano, and it is set to arrive in the beginning of 2011, primarily in OEM PCs from the likes of Dell and HP. Based on the die shots provided during AMD’s Financial Analyst Day, the chip strongly resembles a shrunk Propus (Athlon II X4) die.

This would make sense given that Llano and Propus are both oriented for the mainstream, but Llano is 32nm, while Propus is 45nm. Shrinking an existing architecture down into a new process will be much easier for AMD to undertake than starting fresh with a new architecture and a new process.

Let’s put them side by side for comparison:

Propus (Left) and Llano (Right)

Propus (Left) and Llano (Right)

The resemblance is fairly uncanny, don’t you think?

It should be noted that the Llano die shot is not complete; the bottom section of the chip has been cut off in press materials, meaning there’s even more silicon at play than we can see at this time.

However, judging from what we can see, the Llano APU appears to feature 1MB L2 cache per core, no L3 cache, and six Evergreen SIMD engines for a total of 480 stream processors.

In effect, Llano is shaping up to be an Athlon II X4 with 66% of a Radeon HD 5750 on board. This will be more than enough to dominate Intel’s Clarkdale and Arrandale parts which pair a Nehalem with simple on-die Intel GMA cores.

Final thoughts

As we have suggested, the GPU computing industry is still very young: DirectCompute can hardly be considered mainstream, OpenCL only recently found support from both GPU vendors, DirectX 11 hardware is absolutely in its infancy, and we have not yet risen to the full potential of what DirectX 11 hardware we do have.

Spoken in those terms, the situation certainly sounds grim, but that couldn’t be further from the truth; the situation is wide open to the ingenuity and expertise of everyone from the solo developer all the way up to the industry giants. Over the next few years, these parties will certainly work to unlock deeper levels of hardware acceleration and offer more programs to the power of the GPU. Indeed, who expected that antivirus applications would be GPU-accelerated? How about browsers or supercomputers? While two of those three examples are NVIDIA-powered, AMD hardware is every bit as capable, if not moreso given that NVIDIA is months late to board the DirectX 11 train.

In all, the discussions we’ve had with AMD over the past year make it clear that the firm is particularly considering the needs of consumers like you. Through games, movies, music, user interfaces, browsers, distributing computing projects, languages and many as yet unimagined applications, the PC has never before been on the cusp of such a large performance jump in so short a time, but AMD is determined to make that jump with a running start.

Comments

  1. GooD
    GooD Great article :)

    Thankx Thrax
  2. ardichoke
    ardichoke I used to think AMD was making a huge mistake buying ATI. Why would they want to compete directly with two of the biggest players in the computer industry at once I thought? Only now do I see the potential sheer brilliance of that move.
  3. lordbean
    lordbean I have to agree, it's looking like purchasing ATI was a good move. AMD is working their tech towards a field that neither Intel or NVIDIA look to be competitive in.

    There's still some hope for Larrabee... but we'll see.
  4. Zuntar
    Zuntar Yea baby, keep it coming!!!!
  5. Ananth Does anyone have any more info on the Llano GPU cores at this time? Will they be able to run native kernels (OpenCL)? Will they have full access to main memory?

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!