Out with the old, in with the new... as fast as humanly possible. That's Intel's take on the age-old adage as they quickly accelerate the deployment of the fastest x86 chip on planet earth. According to Intel marketing, the new Core 2 Duo architecture is the wave of the future, and we have no doubt that this capable CPU design is here to stay. Released on July 27th, 2006, the Merom, Allendale, Conroe, Conroe XE and Woodcrest series of CPUs jump-started an ailing Intel Corporation and sparked Intel's return to the performance race in a big way; claiming a 40% power reduction and speed improvement over Netburst-based Pentium 4 derivatives, studies concluded that not only did Intel trump themselves, they handed AMD a heavy blow.
The power, thermal and speed enhancements that helped the "Core Microarchitecture" rocket past its predecessors also helped chips based on the Core silicon leave their AMD counterparts in the dust. In each market segment, be it mobile, desktop, workstation or server, Core-based components are faster than the AMD parts which once held the speed crown, and there seems no sign of slowing down.
Make no amends, when AMD released their K8 in late September of 2003, Intel was caught off-guard. Coming off an oscillating battle between the Pentium 4 and the Athlon XP series of CPUs, Intel's heavy investment in the gigaherz race left them unprepared for the savage beating that AMD would deal to the gentlemen in Santa Clara, CA with the Hammer and its successors.
Unlike Intel, which opted to produce exceptionally warm-running CPUs with very error-prone pipelines of absurd lengths, AMD took the route of efficiency, and produced CPUs which could do more per megaherz of clockspeed than Intel had dreamed of. Despite attempts at fixing the Netburst design, regarded as broken in its conception, with large amounts of cache, hyperthreading and even longer pipelines, the sole hope that Intel could compete with AMD in the desktop and server arenas came in the form of a little mobile CPU known as the Banias.
The Banias CPU was explicitly designed from the ground up with mobile applications in mind, and was released in March of 2003 under the Centrino name. The Centrino, which has since become the title for the package of technologies that comprise Intel's flagship mobile offerings, has thus far featured a series of four generational CPUs with extremely low voltage requirements, but abnormally high levels of productivity and floating-point strength. When the original Banias was released in March of '03, enthusiast websites began to notice that the CPU was a real workhorse. As the K8 could surpass a Netburst-based CPU even with an 800-1000MHz clock deficit, Banias-based Pentium M CPUs were quickly recognized as being able to blow past K8s even with a 400-600MHz clock deficit. As time progressed, the clamor for porting the Banias to the desktop grew stronger, and came to a head when the Banias bowed out to the newly-fabbed .09nm Dothan. It was with the Dothan in May of 2004 that a company called FPU debuted an ATX motherboard that brought desktop amenities, like dual channel memory and faster frontside bus speeds; apples-to-apples benchmarks concluded that the Dothan was a screamer, handing numerous defeats to the vaunted K8 architecture.
There was, however, a problem with the conclusion drawn by enthusiast websites at the end of 2005's summer months: Intel was sternly entrenched in that gruesome gigaherz race, dumping billions of dollars in R&D into a performance-hemorrhaging line of desktop CPUs. Since 2001, Intel had hammered the idea of high clockspeeds into the minds of John Q. Public. Bigger numbers were better, more cache was better, virtual cores were better. More, more, more! More of everything but efficiency. Unfortunately for Intel, theOpteron kept running circle around any and all flavours of Xeons. It completely shamed the Itanium name by offering 64bit computing backed by the power of x86 knowledge and compatability. But perhaps the most damning example of the Pentium 4's utter failure to perform was in a 2005 study that concluded it would take a whopping 5.2GHz Pentium 4 based on the Prescott core just to rival the Athlon 64 in enthusiast applications. We estimate that around that time, Intel hit the breaking point; recognizing that they had perhaps lost it all with the abysmal performance of Netburst-based Pentium 4 and Xeon chips, Intel called its engineers to the table and quietly acknowledged AMD's engineering principles as the right engineering principles. By July of 2006, Intel would unify the architectures of their desktop and laptop offerings for the first time since the Coppermine Pentium III.
To bring about this unification, Intel had to reverse gears and set their sights on an ultra-efficient CPU that ran at low temperatures and required very low voltages. With this in mind, the first test of Intel's new product goals came in the form of Yonah, the first product designed on the Core series of architecture which would evolve into the Core 2 design we use today. Replacing the Dothan under the Centrino label, the Yonah-class CPU came in single and dual-core variants with a .065nm fabrication technique, and would firmly make its mark on mobile workspace. The chip was everything laptops needed: Fast, low-voltage and cold. Witnessing their master plan unfolding with the wild success of the Core Solo and Core Duo chips, led solely by the Yonah, Intel continued to beaver away at bringing the powerhouse technology to the desktop. Perhaps by the time of the Core architecture's release with the Yonah chip in January of '06, Intel had finally envisioned its successor by the name of Core 2, the the umbrella name for a whole host of CPUs designed for every market segment.
Taping out and entering initial fabrication in the spring of 2006, the Core 2 processors were unleashed to the masses in July of this year. The Core 2 technology itself was designed explicitly to supplant the Pentium 4, bidding a farewell to the Pentium as Intel's primary brand name since 1993. Owing its heritage to the successes in the Yonah, and tracing its roots back to codename P6 -- the Pentium Pro, the Core 2 series of processors is the culmination of nearly three years in mobile engineering to produce Intel's broadside volley at AMD's domination of the performance charts.
What makes the Core 2 Duo so Fast?
Boosting Efficiency
The Core 2 series of chips, in representing a significant departure from the Netburst-based chips of old, are designed to maximize Instructions Per Cycle (IPC), or the number of tasks the CPU can perform per cycle of the clock (1MHz represents a cycle). The Core 2 chips are estimated at four IPC, to the K8's three, to the Pentium 4's two. While IPC is not a precise measurement of a processor's speed, it has a significant impact, and we can give rough theoretical numbers: The Core 2 series of CPUs are approximately 50% faster than Pentium 4s at the same speed, and 33% faster than Athlon 64s at the same speed. Practical measurement pegs these synthetic measurements at about 10-15% too high, and this is due to the various optimizations the respective chips have received.
As far as architectures are concerned, let's quickly sum up the differences between the latest generations (All dual-core) of Netburst, K8 and Core 2 from a sky-high perspective of general design:
Intel Core 2 |
AMD K8 |
Intel Netburst |
|
Fabrication Technique (nm) |
65 / Conroe |
90 / Windsor |
65 / Presler |
Socket |
LGA 775 ("Socket T") |
Socket AM2 (940 Pins) |
LGA 775 ("Socket T") |
L1 Cache |
64k Exclusive Per Core |
128k Exclusive Per Core |
24-32k Exclusive Per Core |
L2 Cache |
4MB Shared |
512k/1MB Exclusive Per Core |
2MB Exclusive Per Core |
Bus Speed |
1066MHz - PC2-4200 |
800MHz - PC2-6400 |
800MHz - PC2-6400 |
Pipeline Length |
14 Stages |
12 Stages |
31 Stages |
SSE Engine Width (In Bits) |
128 |
64 |
64 |
Max Memory Bandwidth to CPU |
10.6GB/s |
6.4GB/s |
8.5GB/s |
L2 Cache Addressing Width |
256 bits |
128 bits |
256 bits |
L1+L2 Cache Latency |
~11-14 Cycles (L1 = 3 Cycles) |
12 Cycles (L1 = 3 Cycles) |
~16 Cycles (L1 = 4 Cycles) |
Going from the table, the Core 2 clearly has massive bandwidth between the processing cores and the cache in the form of a 256 bit cache width with a median access time of thirteen cycles. What this means is that the Core 2 can access twice as much cache information at on average of one cycle slower when pitted against the newest Windsor-class Athlon 64 X2 chips. In the real world, the Core 2 delivers two times as much information to the CPU cores as the Windsor, while the L2 cache itself is about two and a half times faster thanks to the Core 2's ingenius cache design. When placed against the Presler, the Core 2 can access the same amount of data but do it up to 25% faster.
Supercharging Common Tasks
Another advantage of the Core 2 comes in the form of the width of the SSE engine. Many applications today make use of the SSE registers to do complex mathematical tasks for media encoding, gaming, 3D rendering, audio, and a whole raft of enthusiast, prosumer, and even enterprise-class processing tasks. A register is a stream-lining of commonly used algorithms in processing, and in the specific case of SSE, the SSE register simplifies multimedia tasks which would otherwose gobble up large chunks of CPU time to do things we consider very simple, like video encoding. Without delving deeply into the intracacies of media encoding, it is a purely mathematical task which analyzes, resizes and compressess thousands of sequential images; without things like SSE, these tasks would take days, maybe weeks, not hours. Getting back to the Core 2's implementation of the SSE engine, we see that it is a 128 bit width. This width is significant because SSE registers are 128 bits in length. Prior to the advent of the Core 2, each SSE register that was called was spread across two clock cycles, meaning that the maximum number of usable SSE registers for a CPU was one half of the CPU's given clock speed. In the Core 2's case, it can gobble up SSE registers twice as fast as its competitors, able to process a register for every megahertz the chip has backing it. In the real world, this means that applications that heavily rely on SSE1/2/3/4 could be accelerated by as much as 50%, on top of the boost in speed granted by cache speed and bandwidth.
But it only gets better, as while the K8's Hypertransport architecture previously provided unprecedented levels of memory bandwidth to the CPU, even AMD's most recent 1GHz implementation of Hypertransport is not enough to stave off Intel's 40% better memory bandwidth. Additionally, while AMD has long-dominated the CPU <-> Memory latency to the tune of 47 nanoseconds (The Pentium 4 is 100% slower), we are beginning to understand that this lead has shrunk to only about 17% thanks to the Core 2's design. This is not the only design improvement Intel has done with CPU-to-RAM communication, and the second comes in the decision to use PC2-4300, instead of a PC2-6400-driven bus. By design, PC2-4200 is much slower in operating frequency at 533MHz DDR, compared to PC2-6400's 800MHz DDR. Because PC2-4200 runs at a lower clockspeed, the memory chips can use a lower latency, and can therefore load, read, process and dump information from memory faster than PC2-6400 is capable of doing. This design decision further narrows the gap between the onboard memory controller featured in the K8 and the northbridge-tied memory controller in the Core 2. In fact, the gap is so narrow as to be negligible.
Making a Better Pipeline
Next on the list is a technique that Intel calls Macro-Ops Fusion. While this is a very fancy name, it allows Intel to do something very remarkable with their CPU, and that's combining complex processing tasks (Each processing task is known as an instruction) so a single processor cycle can compute them. To elaborate, let's say a user calls a function in a program, like opening a picture; beneath the goal of opening the picture is a programming language that drives the task, and the programming language itself is ultimately decoded by the CPU. In this case, the CPU receives the user's request to open a picture via the underlying code of the program, and the CPU must then compute the proper instructions to make that picture happen. Sometimes the code powering the task requires multiple instructions to be run, for example one instruction sequence to decode the image, another instruction sequence to process the menu's style and function. All of these instructions enter an instruction queue, and it is the job of a piece of the CPU called an x86 decoder to understand what queued instructions are being called by the programs, and to translate those into strings of instructions that can be processed more efficiently -- these efficient strings are called micro-ops.
Not only does the Core 2 have the most x86 decoders in the history of desktop computers at four (Compare this to the Presler's one and K8's three), it has the capability to combine micro-ops into something Intel calls a macro-op. Traditionally speaking, there can be one instruction per decoder per cycle, but Intel has given the Core 2 the ability to recognise micro-ops that can be fused together as a single output of an x86 decoder, and to go ahead and combine them for processing in the pipeline -- your picture opening. While this may seem insignificant, it's one of the most crucial keys to the Core 2's design prowess. Combining micro-ops into a single macro-op gives the CPU the effect of a fifth x86 decoder 10% of the time, according to Intel. This too may seem insignificant, but consider that it's a 10% throughput advantage that other CPUs just don't have, on top of a 400% improvement in instruction throughput just over its immediate predecessor.
When a macro-op enters the pipeline, it has a two-fold benefit for processing time: The first is that an entity known as the Out of Order Buffer, or a section of the CPU that corrects mistakes in the order in which micro-ops are entering the pipeline, has one less micro-op to reorder if necessary. The second advantage is lower overhead in a section of an x86 CPU called the backend, or the part of the pipeline that determines precisely when an op is entering the pipeline, shoving it into the pipeline, getting it processed, and moving it out of the line. Like the reduction in the OoO Buffer's overhead, the scheduler suddenly has one less instruction to keep track of because it's been combined with a brother. The time savings are enormous.
One of the last big features, aside from some generic improvements we'll touch on in a moment, is something called micro-ops fusion. To put it much more simply than macro-ops fusion, micro-ops fusion allows very long and complex instructions to be shuffled to other parts of the CPU and processed in one micro-op. The effect is simple: Tasks which would require two micro-ops have been designed as one micro-op since the days of the Banias -- the Core 2's heritage reveals itself! These micro-ops give the backend and the OoO less of a headache and increase CPU efficiency. This is a jovian accomplishment, as such a task would previously destroy a CPU's potential upper headroom. And while the description we have given here is a gross simplification of the real effect, it serves its purpose to illustrate the point.
In the K8, on the other hand, it's a bit of a tug-of war. As we mentioned, the K8 has three x86 decoders known as complex decoders compared to the Core 2's three simple and one complex. A complex decoder handles x86 instructions that produce multiple micro-ops, and a simple decoder handles x86 instructions that produce a single micro-op. The advantage of the K8's three complex decoders is that at any one time, it can handle three times as many complex x86 instructions as the Core 2 chip, but each complex-decoded instruction must be passed to a sequencer which leads to computational overhead and delayed processing time. So, in effect, the K8 can handle more of the complex tasks, but at the expense of speed. The result is that the K8 is faster in the presence of extreme amounts of complex instructions, but when the complex instruction queue is shallow, the Core 2 blows past it by chunking simple instructions through simple decoders without overhead. Unfortunately, however, Intel's implementation of macro-op fusion doesn't exist in current AMD chips, and Intel's implementation of micro-op fusion is faster than AMD's.
Last, and certainly least, the Presler comes in dead last with only one complex decoder, coupled to sequencer overhead, which dumps into a backend that can't possibly hope to fill the 31 stage pipeline. The unfortunate result of this boneheaded CPU design is a CPU that only benefits from consistent and predictable input, like media-encoding, which is a constant stream of the same functions over and over for hours. In the grander scheme of things, the Presler is just choking from a lack of data to keep itself going. Attach this abysmal bottleneck to a pipeline which is so long that it often has to abort decoded ops due to computational errors or mispredictions in what the user was going to do next, and you have a dud of chip that finally won in the war against heat more than a year too late for it to matter. Furthermore, released less than a year prior to the Core 2 microarchitecture, the Presler will be unceremoniously relegated to extreme budget applications starting with the early quarters of 2007.
As far as the generic improvements are concerned, the Core 2 line brings faster ALUs and FPUs, both of which increase the speed at which the Core 2-based CPU lines can crunch numbers. Over all, it was the goal of the engineers at Intel to make each standard component of an x86 CPU faster than any previous processor. At every turn, cache is faster, pipelines are faster, decoders are faster, SSE is faster, registers are faster, instructions are faster. Intel has succeeded in going for more, more, more without being a laughing-stock as they were with the Pentium 4 line.
Intel of Yesterday and Tomorrow
The Workstation and Enterprise Market
With today's Core 2 architecture bringing the first significant jump in x86 power since the K8 came to dethrone Netburst, Intel has a full-featured product range that spans notebooks, mainstream, enthusiast, workstation and server ranges. At last we're going to touch on what Intel has to offer, and take a look at what segments the products fill, and where Intel is headed with their products. First up at the top of Intel's range is the Xeon CPU, which has existed as a name since the days of the Pentium II. Often boasting larger cache sizes than its desktop brothers, the Xeon is positioned to fulfill the needs SMP systems, be they rackmount, workstation, or clusters. The Xeon has reasonably shadowed the evolution of Intel's desktop line, and today there are four different flavours of Xeons floating around: Two from the Netburst era and two from the Core 2 era. On the horizon, there are a series of three separate classes of Xeons designed to fulfill Intel's stratification goals within the Xeon line, so let's take a look:
Codename |
Release Date |
Market Name(s) |
Cores / SMP |
Die Size (nm) |
Socket |
Frequencies |
Voltages |
TDP |
| Dempsey (Netburst) | May 23, 2006 | Xeon 5030-5080 | 2 / Yes (2 CPU) | 65 | LGA 771 | 2667-3733MHz | -- |
95/130w |
| Woodcrest (Core 2) | June 26, 2006 | Xeon 5110-5160 | 2/ Yes (2 CPU) | 65 | LGA 771 | 1600-3000MHz | -- |
65/80w |
| Conroe (Core 2) | June 26, 2006 | Xeon 3040-3070 | 2 / No | 65 | LGA 775 | 1866-2667MHz | -- |
65w |
| Tulsa (Netburst) | August 27, 2006 | Xeon 7110N/ M-7140N/M | 2 / Yes (2-8 CPU) | 65 | Socket 604 | 2600-3500MHz | -- |
95/150w |
| Kentsfield | Est 4Q06/1Q07 | Xeon X32xx | 4 / No | 65 | LGA 775 | -- |
-- |
135w (Est.) |
| Clovertown | Est 2H07 | Xeon E/X53xx | 4 / No | 65 | LGA 771 | -- |
-- |
80w (Est.) |
| Tigertown | Est. 2H07 | Xeon ???? | 4 / Yes (2-8 CPU) | 65 | LGA 771 | -- |
-- |
80-130w (est.) |
| Harpertown | Est. 2008 | Xeon ??? | 8? / ??? | 45 | -- |
-- |
-- |
-- |
It is important to know, as we mentioned above, that Intel stratifies their Xeon line into three separate segments: Workstation, two CPU and multi-CPU. The workstation line is frequently a clone of the highest-performing desktop CPU at the time, and we see this is the case with Intel's Xeon 32xx series. The 32xx series currently features Conroes with a Xeon name, and will eventually feature a rebadged version of Kentsfield, Intel's upcoming desktop quad-core chip of two Conroe CPUs in one processor package. The second line of chips that Intel offers is the Xeon series starting at 51xx, which unlike the Xeon 32xx line, features more cache and higher FSBs than desktop counterparts, but more importantly, 2P support. Lastly, comes Intel's grand offering of the Xeon MP 71xx line (Noted with the -N suffix if it has a 667MHz FSB, -M with an 800MHz FSB), which are Xeon CPUs capable of working in SMP configurations.
Intel has something of a problem with the Tulsa (Xeon 71xx) and Woodcrest (Xeon 51xx) being on the market at the same time, and the issue actually lies in what the Woodcrest can't do: It's only dual-processor capable. While the Woodcrest is profoundly faster than Tulsa, Intel only has the Tulsa to compete with the Opteron in the very important 4P+ processor segment. That means until Tigertown ships in 2007, Intel will have no answer to the dominance of the Opteron in the highly-lucrative four-way or eight-way CPU market. This situation is also compounded by socket disparity in the Tulsa and Woodcrest. Companies looking to make an initial investment in the Tulsa for a 4P or 8P system are stuck with an outdated Socket 604 platform while Intel hustles LGA771 to the Woodcrest-and-beyond crowd. It's not as though you can buy Tulsa now and jump to Tigertown in 2007, which is something you could do if you bought a Woodcrest-powered 2P system, but that wouldn't fit your CPU requirement. Intel can't get Tigertown out fast enough, and it knows it, which is why it's up and down the trade-shows with 4P/16C Tigertown boxes blazing away.
Beyond Tigertown, the situation is very hard to determine: Harpertown is mentioned as the server version of Yorkfield which we will discuss below, but the information available on Harpertown suggests that it is just Yorkfield with more cache and a better FSB as has been the case with virtually every Xeon in the last five years. We'll touch more on the disparity in the desktop section. Names also floated in the last twelve months include Dunnington as a successor to Tigertown and Gainestown which we were unable to find concrete information for.
In the desktop end of things, the situation is significantly more clear, as Intel is vastly less mum about what they intend to do with upcoming CPUs. The flagship force of Intel's desktop line comes in the form of Conroe, to be joined by the quad-core Kentsfield very soon. The Kentsfield is comprised of two Conroe chips wedged into one CPU package. From now until approximately 2009, the Conroe and its closely-related successors will be Intel's mainstay for most of us. In 2009, however, Intel is expected to produce the first post-Core 2 architecture in the form of Nehalem for their Centrino line, and the desktop segment will closely shadow the release of Nehalem with the Westmere, a desktop version.
Mainstream and Enthusiasts
In the desktop, the socket choice is much simpler: LGA 775 until 2009. Gone are the days of Intel shuffling sockets every time a new Pentium 4 revision hit the shelves. LGA 755 is a forward-thinking interface backed by the power of some of the best chipset engineers in the business, and until post-Core 2 chips are floating about LGA775 is a socket that's here to stay. So, with that said, let's take a look at what's being offered for the desktop:
Codename |
Release Date |
Market Name(s) |
Cores |
Die Size (nm) |
Socket |
Frequencies |
Voltages |
TDP |
| Allendale | July 26, 2006 | Core 2 Duo E6300 / E6400 | 2 | 65 | LGA 775 | 1860 / 2133MHz | -- |
95/130w |
| Conroe | July 26, 2006 | Core 2 Duo E6600 - X6800 | 2 | 65 | LGA 775 | 2400-2930MHz | -- |
65w |
| Kentsfield | Est 4Q06/1Q07 | Core 2 Quad Extreme QX6700/Q6600 | 4 | 65 | LGA 775 | 2667/2400MHz |
-- |
135w (Est.) |
| Wolfdale | Est. 2H07 | Core 2 Duo ???? | 2 | 45 | LGA 775 | -- |
-- |
-- |
| Yorkfield | Est. 2H07 | -- |
4 | 45 | LGA 775 |
-- |
-- |
-- |
| Westmere | Est 2H08/2009 | -- |
-- |
32 | -- |
-- |
-- |
-- |
Now, we originally talked about the Yorkfield in the Xeon section, and we'd like to come back to it at last. The information on the Yorkfield is very contradictory, and the information we have placed in the table is what's considered the safest expectation of the core. Here is what we do know about Yorkfield: It's going to be the successor to Kentsfield, it will be a 45nm design, it will be LGA775 and it will have atleast 4 cores. Where the conflict comes into play is with Yorkfield's core-count, which has been said to be a minimum of four, a possible of eight, or a maximum of thirty-two. It is the desktop version of the Xeon's Harpertown chip, however we don't expect either the Harpertown or the Yorkfield to be more than four cores; we suspect these chips will be positioned as die-shrinked derivatives of their predecessors.
Perhaps a more interesting processor is the Westmere, the first desktop processor that will feature architecture not explicitly based on existing Core 2 silicon. For over two years, Intel has had the Nehalem on the roadmap for mid-2009 in the mobile segment. The Nehalem is a 45nm mobile part which we'll discuss later, but suffice it to say what was known for years as the Nehalem-C for the desktop version is now known as Westmere. Westmere is expected to be introduced as the first desktop chip at 32nm as a die-shrunk derivative of the Nehalem, possibly featuring eight cores in an unknown socket. The Westmere is also one of the first desktop chips expected to use Intel's CSI, or Common Systems Inteface, a technology that will first be debuted in the Tukwila-class Itaniums in mid-2008 and trickle down from there. The CSI is Intel's replacement for the Front Side Bus, allowing cores to communicate with one another directly without moving communication out to the northbridge. Furthermore, CSI may also include the introduction of an on-die memory controller which will seal Intel's transition to a Hypertransport-like bus architecture.
Our prediction is that Yorkfield will either be a 45nm die shrink of the Kentsfield (More likely), or it'll rear its head as a "Native" quad-core component to battle with AMD's K8L platform to be released at roughly the same time. In the approximately eighteen months between Yorkfield and Westmere, we can envision an unroadmapped octal-core design featuring two Yorkfields in a single package similar to the Kentsfield harboring two Conroes today. This seems to follow Intel's track record of condensing dies in a package, such as the Tulsa being supplanted by the Woodcrest. What happens after Westmere is anyone's guess.
Continuing to float the main-stream in the form of dual-core throughought 2007, Intel will die-shrink Conroe and call it Wolfdale. Allendale, we presume, will be phased out in favour of the old 65nm Conroes, while Wolfdale will continue on as the flagship CPU, with Yorkfield succeeding Wolfdale by a just a month or two forr the more-money-than-sense enthusiast crowd. Topping out the range of mainstream CPUs, the Wolfdale-class dual core chips will be joined by Kentsfield CPUs not bearing the Extreme Edition flag, thus bringing the price down. Bringing up the rear, CPUs we have not included on the roadmap include Wolfdale-L and Conroe-L, with the "L" denoting low-cost. Intel plans to push single-core 65nm Conroes under the Pentium and Celeron names as budget options, and when the die is shrunk, the same will continue to occur with Wolfdale-L.
Out and About: The Mobile Market
Lastly, we have the mobile segment, which has played a curious role in the development of the Core 2 series. Unlike AMD which produces its mobile lines as an afterthought to the desktop and enterprise segments, Intel spends copious amounts of time specifically-engineering mobile CPUs that aren't cut from the same cloth as its bigger brothers. As we mentioned in the start of the article, the Banias mobile CPU was a complete 180 from the direction Intel was headed with their Pentium 4 line, but the little CPU that could became the inspiration for today's Core 2 components, and represented the idea that Intel wasn't just a stale CPU company clinging to doping the uneducated masses. We see further elements of mobile-creates-desktop in the Penryn, itself a die-shrunk 45nm mobile-specific version of the Conroe. Clearly illustrating that Intel has a special care for their mobile line, it will test 45nm on Penryn and then use that practice to make the Nehalem.
The Nehalem itself will be the first post-Core series architecture, drawing only the 45nm fabrication technique from its predecessors. Not much is actually known about the Nehalem, but it has been mentioned in association with Gainestown, Bloomfield, Gilo and Beckton. Furthermore, the Nehalem will first be seen on the Centrino platform and then be ported to the desktop as 45nm product as the Westmere in 2009. Lastly, to round out Intel's offerings, a brand new architecture out of Intel's 32nm fabs in 2010 will be known as the Gesher. Nothing is known about it other than its size.
Respecting Moore's Law: Intel's New Engineering Cycle
With the release of the Core 2 Duo and its counterparts, Intel has adopted what they call the "Tick-Tock" approach to CPU development. What this means is that every two years, Intel will have a die-shrink for an existing processor design within the first year, and a new generation of architecture by the end of the second. We can see this in the diagram below as the Intel Core microarchitecture in the form of Yonah, as a shrink from the 90nm Dothan. By 2006, at the end of the cycle starting in 2004, Intel debuted the new Core 2 architecture. By the end of 2007, Nehalem is born for laptops, and shrunk to 32nm for the desktop version a year later. In this way, we can begin to see that Intel ticks with the mobile platform, and tocks with a shrunk destkop version.
Lastly, we have a very rough roadmap gleaned from the copious amounts of research conducted for this guide:
As you can see, there are obvious gaps in what is to succeed the Kentsfield and Conroe chips in the low end workstation space; the gap indicates that Yorkfield may be moved up, which is the most likely option. We also see what can either be seen as a dual-core gap after Wolfdale in the middle of 2008, or the final elimination of dual-core CPUs. Judging by the year, the latter of the two options may be more likely. The last sore spot in the roadmap comes in the form of Clovertown's exceptionally long life. We have a hard time believing that something won't come to usurp its role in the Xeon DP space after roughly the middle of 2008. We feel as though a server version of Westmere may be very likely.
All in all, Intel has done nothing short of a remarkable job with the Core 2 line, and we appreciate Intel's acknowledgement that the Netburst era was a dark and sour one for the chip giant. From this point forward, Intel demonstrates a steady course of die shrinks and innovation with their tick-tock engineering, and we can similarly appreciate the environment of cross-segment inspiration that Intel's brass has established. While Intel may be hard-pressed to get the enterprise situation back under control, particularly with the utter lack of SMP configurations under the Core 2 design to combat the Opteron, they can be lauded for their return to the spotlight for the desktop and mobile segments. We'll keep a close eye on the market for the next few years and update you on changes to the roadmap, including delays, fulfilled expectations, and surprise cores.






