Quad Core Processing: Over-simplified, demystified, and explained


IBM struck first, making dual core processing a mass-market reality in
the Power Mac G5 (with the PowerPC 970MP). They were soon followed by the release of Intel’s Pentium D (8xx series)
and AMD’s Opteron (x60, x65 and x70), spreading dual core outside the Mac world.
This time, a little over a year later, Intel is poised to strike first, preparing to release the Kentsfield, the first quad core desktop processor. We’re going to explain what quad core is, how it’s being implemented
by the major chip manufacturers, what benefits it’ll bring, and, most importantly,
why we need it.

What is Quad Core?

A processor is comprised of three things. The first
is the core. The core is actually responsible for executing processing tasks.
The core is fitted inside the die, which is the silver tab you may have seen
on the top of a processor. All of this is fitted inside the package, which
is the green material capped by the heatspreader on top, and laced with pins
on the bottom. It is important to know that the number of cores can scale
independently of the dies.

As an example, Athlon XPs had one die
and one core. The first dual core chips had two dies with one core in each
(see the diagram below). Today’s dual core chips have two processing
cores fitted under one die. The quad core chips that will first replace
them merge two dual core dies into a two-die, four-core package or one
die with four cores. It is both an evolution and a continuation of
a trend.

It is important, though, to know that quad core is a result. Particularly
in computer engineering, the goal does not necessarily dictate the means. For example, NVIDIA and AMD / ATi
both render DirectX 9 with their products, but they both approach it from different routes.
Quad core is the same way; there are two primary methods by which to achieve
the goal: the inelegant way and the native way.

How are Quad Core chips made?

The first implementation, the inelegant one, comes from placing
two processor dies in the same space that one used to occupy. This procedure
is made possible by improving the manufacturing process of the dies that are
being placed together on the single processor. The advantage for the manufacturer
is three-fold.

  1. They’re able to crow about being first to the market
    with the “wave of the future” processor.
  2. It furthers a chicken-and-the-egg
    philosophy (we’ll touch on this again later) by both creating a demand that did not previously
    exist and establishing a new direction for processor-intensive content.
  3. It allows manufacturers to produce multi-core chips inexpensively. Throwing away one dud die with two processing engines is better than throwing away one
    dud die with all four processing engines on the same slice of silicon.

The
main disadvantage of this technique is that the heat produced will double when two dies are stuck in the same package.
For example, the Core 2 Duo chips radiate about 65 watts of heat.
The Kentsfield, which crams two Core 2 Duos together, radiates
between 125 and 132 watts of radiant energy. That’s a lot! CPUs haven’t put
out that much heat in a long time.

The other disadvantage, the one that
matters to most of us, is the logic used to price the chips. If one Core 2 Duo costs about $400 USD, then one Kentsfield should
cost about $800! The bleeding edge is an expensive place to buy in and a
lucrative place to sell in.

Here we can see the progress: One larger die is shrunk and
placed
with a twin
on the same package.
Then a
single-die/dual core die is
produced, and the cycle repeats ad infinitum with
more cores per die.

The other way to make quad core chips was expressed in the diagram
above, known as the native way. Native quad core designs
incorporate four individual cores on one single processor die. The benefit for the
manufacturer in this case has three parts.

  1. They are afforded the ability
    to state that they have the world’s first “true” quad core chip.
  2. These chips run much cooler than two die / four core solutions, since
    a single die will always radiate less heat
    than two in the same package. The manufacturer can then bundle less expensive coolers with their chips, and savings are then
    passed onto us as consumers.
  3. Perhaps the most important benefit
    is that a native quad core processor can have all four cores talk to one
    another at the speed of the processor
    itself, rather than running communication
    from cores 0 and 1 out across the motherboard to talk to cores 2 and 3.

native_vs_inelegant

In this scenario, we see that the cores on the “native”
QC chip are capable of talking directly
to one another.
On the “inelegant” solution, while still quad core,
it must endure the
lag
of sending information across
the system bus if one
half of the chip needs to talk to the other.

How substantial the lag is with the inelegant solution is hard
to compare, because we’ll probably never see a native quad core design that
is otherwise identical to its inelegant brother, save the number of dies.
It just won’t happen; technology doesn’t progress like that. What
matters in the end, however, is that it’s a quad core chip; there are four
cores, they’re all real, and they all can bear a processor load.


Intel has opted to pursue the inelegant solution for now. As a result, it is almost eight months ahead of AMD in the race to go quad. Intel
is also fixing to release a significantly cheaper version of the Kentsfield,
the QC6600, when the novelty factor has worn off of the Extreme Edition. This means that Intel’s “dirty” and “inelegant”
approach to quad core will be available to the masses at the price of today’s
Core 2 Duos around March 2007. This will beat AMD by four to six months.

AMD’s approach, on the other hand, comes partly by necessity and partly by
will. The mean, green underdog is going straight from native dual core to
native quad core in time for the end of summer in
2007. AMD, at the time of writing, has just introduced their 65nm processors while
Intel has been featuring them for a year or more. It makes sense that AMD
would catch up by making their native quad core chip the crowning achievement
of their move to 65nm. As it stands, AMD was just too far behind on 65nm to
introduce an inelegant quad core chip without shooting their future plans
for Barcelona
in the foot.

How Quad Core works

Quad core requires the union of many elements
to make it work well. The easiest stumbling block at this stage
is hardware support. Motherboards and BIOS updates need
the ability to recognize and run these chips. Intel already has many boards that support all three of their last major chips
and AMD is going to do the same.

The next hurdle to overcome is
the operating system, which must be capable of recognizing more than
one core inside a chip. Thankfully, Microsoft and virtually all flavors of
Linux distributer have fully endorsed the power of multi-core chips, allowing
today’s versions of Windows and Linux to support chips with two, four, eight,
or even sixty four cores.

The last step on the road to quad is the biggest
lynchpin to its success: software that runs on the chips.

Software is what makes or breaks multi-core chips. The reason lies in a concept called threading. Threading is how a program
divides its workload across multiple cores or multiple physical
processors. A thread is a stream of instructions related to a task the
program is managing (like game physics, sound reproduction, or 3D rendering).
Relatively speaking, desktop programs that divide their labor, “multi-threaded applications”, are very new. In 20-odd
years of mainstream desktops, only in the last three have we seen the
shift towards multiple processors, multiple cores, and applications that recognize
them. As the amount of cores increases, it gets harder for developers to divide
the labor.

The frequent problem with threading is that while applications may
be performing many tasks at once, each task relies on another to make
it work. If a video games is producing 3D graphics, there are many other things happening (like sound,
AI, and physics), but all of those are ultimately beholden to what the 3D renderer
is doing.

Consider walking into a room with water dripping from the ceiling.
There are three things occurring here.

  1. The computer must render
    the room and the water.
  2. The computer must figure out how the drop of water
    changes mid-flight, and how the puddle changes when the drop impacts it.
  3. On top of all of that, the computer must produce the sound effects of the
    water hitting the puddle.

Notice that each task is dependent on the task before
it? Without the scene, there would be no physics to do, and with no physics to
do there can be no sound effect. So how does a developer split it up?

With broad threading, the developer generally divides the workload
between the two cores, and tells them to stay synchronized with the voodoo
that they do. This is an inelegant solution as it is really only designed
to fulfill the demands of two processors whether they are dual-core or physical processors.

In
a quad core world, this doesn’t work so well. There is more processor horsepower
than broad threading can use
, so all those unused clock cycles are wasted.
Benchmarks around the Internet are showing the inherent weakness of broad
threading as higher-clocked dual core processors surpass quad cores in performance
despite a vastly greater potential under the hood of quad core
machines.

dcore_broad_threading

Here we see the program as the red and blue lines entering
the dual core processor. It’s a broad-threaded
mutli-threaded application, so the processor workload is simply split in half. Each
core takes one.

With fine-grained threading, the developer carefully
analyzes everything the program can possibly be doing at
any one time - a task some regard as the equivalent of predicting chaos - and then writes
the program so that each possible task knows part of the processor it’s
going to in order to truly maximize the horsepower of the processor. The weakness of broad threaded apps is that cores in a processor can often go underutilized; there isn’t always
a way to keep that second thread alive and kicking in a multi-threaded scenario.
Fine-grained threading ensures that there’s always something to do for each
of the cores.

dcore_fine_threading

A fine-grain multi-threaded process generates hundreds of individual
tasks, each one directed
specifically to
an underutilized core to guarantee the full horsepower of the CPU is being
used.

Unfortunately, developers have only recently become
very good at fine-grained threading for dual core, and now quad core
is on its way. Imagine the complexity of the processor above doubled!
Each half of the processor would be receiving two sets of red and blue arrows with
a lattice of processes, but worse still, the programmer would have to know
to code such a beast. This is both the biggest blessing and biggest
curse of quad core: the potential for massive processor ability lies right in that
square of silicon, but programming for it is an absolute work of
art.

What benefits will Quad Core bring?

Right now, there are only a few applications that truly benefit
from quad core computing. These applications are the sort that have always
benefitted from more processor horsepower, no matter how that was obtained.
Rendering 3D graphics for still scenes, compression of DVDs into portable
movie files, the compression of CDs into MP3s, and anything else that is just
a massive chunk of mathematics. In the aforementioned cases, the processors can
work in tandem to crunch large sets of numbers because they’re predictable,
not terribly dynamic, and do what a processor does best: compute. On the other hand,
games (right now) will receive little to no benefit from a quad core processor because
they only recently began to really harness the power of dual core offerings.
The benefit of quad core is not what it does now, but what it will do.

When AMD and Intel released dual core processors, everyone wondered why we needed
that much processing power. Games often failed to fulfill the maximum load
of a single processor and game developers complained that it would be difficult
for them to split what was happening in a game across multiple processors. We see
today, though, that the industry has risen to meet the challenge. Developers
and companies that once squabbled over the advancements that dual core would
bring are now on the dual core bandwagon with powerhouse top-ten titles. We are all reaping the benefits of a technology that ushered in the change.
Like the old chicken and the egg scenario we mentioned earlier, we needed
the software to drive the preeminence of dual core, and we needed dual core
chips to drive the software.

We have no doubt today that the quad core scenario is entirely
similar. Perhaps one day we’ll reach the point where the benefit does not
exceed the cost of adding additional cores, but looking at titles like Alan
Wake
and Valve Software’s experiments
with fine-grained threading computing, we don’t think that
the quad core chip’s cost exceeds its benefit. We think there is a bright
future for quad core, and that the value of the chip should not be measured
in what it does today, but what it will do for us in the middle of 2007.

There are four primary tasks that every
game must perform and the processor must assist in doing these tasks.

  1. 3D
    rendering
  2. Physics
  3. Sound
  4. Computer AI

Right now,
assisting the graphic card’s processor with 3D rendering and physics can take up more
than one core each. Sound and AI can fill up the rest, pushing both of your
cores to the limit. Imagine, though, if you could dedicate an independent
core to physics, one to assisting the graphics card, a third core for physics/graphics
spillover, and the last one for sound and AI? Developers would receive an
unprecedented increase in versatility because, at last, a whole processor could
in effect be donated to each process that used to be ram-rodded into one core.
It’s like moving 40 gallons of water through four hoses at once, rather than
waiting for all forty to filter down the same pipe. Clearly the four pipes
are going to move it faster!

Why do I need Quad Core?

Quad core is the future, and it’s a future that’s coming faster
than the dual core one did. While few developers reacted quickly to the introduction
of dual core CPUs, (perhaps unready for such a drastic change), developers have begun acting on quad core
even before there was a chip to test it on. Valve Software (of Half-Life 2 fame)
and Remedy Entertainment (of Alan Wake fame) both began work on optimizing quad
core code before the Kentsfield had even finished production on the
very first processor. This time, developers have had fair warning, and the good
sense to read the writing on the wall.

If you are the owner of a socket AM2
motherboard or an LGA775 motherboard that’s Core 2 Duo compatible, you’re
set for the introduction of quad core. With the QC6600 from Intel and the Barcelona chips from AMD, you’ll be able to drop a brand new chip
in your motherboard and double your processing capacity.

We need quad core to drive the future and to give the future
a platform to come to. While it may mean you have some idle cores on your
processor for the time being, you’ll have more idle strength in your processor than anyone
with an average desktop computer has ever enjoyed. Quad is the power of tomorrow.

Digg this article


Related articles on Short-Media: