Pipelining Explained
The word “pipelining” with
respect to a processor may be a familiar word to enthusiasts but many may not
know much beyond the term or even what it means. Learn now what pipelining
is and if it is something that will benefit your computing experience or not.
In the most basic terms pipelining is the ability of the processor to “cue”
data it may require while the processor is completing an instruction. Think
of it like an assembly line inside the processor except this assembly line
sees several parts of the finished product happen simultaneously. Pipelining
works as long as the right tools are used and the right parts are available
when needed. Conversely, the assembly line can experience slowdowns when the
wrong tool or wrong part is thought to be needed…when it isn’t.
The parts equal the sum of the Pipeline.
Instructions
All programs can be broken down into a fixed number of instructions, known
as the Instruction Set. These instructions include very basic arithmetic,
memory access, jumping within the program (branching) and other instructions.
Each of these instructions has a given path that are taken through a processor.
A very simple way of completing a program would be to figure out what instruction
the processor must complete, complete that instruction and then move on to
the next. The following is a simple example of a few instructions being completed
linearly.
|
Note the total number of cycles to complete the instruction is: Average Cycles
Per Instruction (x) * Number of Instructions.
This manner of completing instructions, while very simple, is rather inefficient
because several parts of the processor remain inactive while one instruction
is being completed. Wouldn’t it be great if we could use most of the
chip at the same time eliminating dormant parts of the hardware? This is
the purpose of pipelining.
Here is a brief overview of what makes pipelining
possible:
Every instruction has five basic divisions: Fetch, Decode, Execute, Memory
Access, and Store. “Fetching” involves accessing
memory that contains the program. The next part, Decode,
involves looking at the data obtained in the Fetch to determine what kind
of instruction is used, and what data will be necessary to complete the
instruction. Every instruction has the same Fetch and Decode because until
Decode has completed, there is no way of telling what the instruction is. Execution
is where any ALU (Arithmetic and Logic Unit)
operations (like Add, Multiply, AND, OR, etc) are performed. Memory
Access is only used by instructions that store or
load data to/from memory. The final part of an instruction is Writing the
results (if applicable) to the registers within the processor. Each of these
divisions uses separate parts of the processor, and this is the basis for
pipelining. Here is an instruction broken into these five divisions with
some of what happens in each part.
|
Pipelining
Pipelining is accomplished by adding registers in between the stages to hold
the results of each individual stage. After the first instruction completes
the Fetch stage it moves to the Decode stage and then the second instruction
starts the Fetch stage. The following is a picture to help illustrate the
process:
|
Note that now all the instructions have a fixed number of cycles to complete
which is equal to the number of stages. As the number of stages increases,
the amount of work that the processor does in a stage decreases along with
the time required for a stage. This allows the frequency of the processor
to increase. The time to execute N instructions with a pipeline of x stages
is t = x + (N
- 1)
Advantages of Pipelining:
- More efficient use of processor
- Higher frequencies possible.
- Quicker time of execution of large number of
instructions
Disadvantages:
- Pipelining involves adding hardware to the chip
- Pipelining requires a large
instruction cache to keep the pipeline “full” and
minimize stalling
Branching
Since branches are conditional there are two possible instructions that could
follow a branch instruction. If the branch is not taken the next instruction
is immediately after the branch. However, if the branch is taken, a different
instruction would follow the branch. This means that every branch that is
predicted incorrectly requires a flush of all the instructions that were fetched
after it. This is not a huge performance hit on a short pipeline or one that
has the branch determined early in the pipeline. If this is not the case
then a large number of instructions will be flushed, causing a large performance
hit. These bad predictions can be minimized using one of several branch prediction
techniques which can reach around 98% accuracy.
Dependencies
Consider executing the following two equations:
x = 5 + 6
y = x + 7
You cannot start to compute ‘y’ until ‘x’ has been
stored. Storing ‘x’ occurs in the final stage of the pipeline but
calculation of ‘y’ has already started. Therefore, the pipeline
must be stalled to allow the value of ‘x’ to be written, and then
calculation of ‘y’ can finish. These data dependencies occur frequently
in instructions, so special forwarding hardware is added to processors. This
minimizes the impact of these dependencies and often removing the need to stall
at all.
Pipelining: Good or Bad?
Pipelining, like most things in life, is good in moderation. Making a processor’s
pipeline too short causes a longer minimum clock period which hinders the
manufacturer’s
ability to ramp up the clock speed. Making the pipeline very long allows faster
clock speeds however it also increases the cost of stalls and flushes which
negatively affects performance and also increases the amount of resources
required to pipeline the processor. A pipeline that allows for a balance between
these two; a decent clock speed without severe performance hits from stalls
or flushes is the optimal design.
Works Cited
Patterson & Hennessy. Computer Organization & Design: The Hardware/Software
Interface, 2nd Edition.
McFarling, Scott. WRL Technical Note TN-36: Combining Branch Predictors, June
1993.








