This is the last article in a six-part guide book.
Step 4: Combining Memory and CPU Overclock: Looking for Long Term Stability
By this point, we have painted a pretty clear picture of how well our CPU and memory can overclock independently. Now we must determine how to extract the maximum amount of performance from both our CPU and memory subsystem simultaneously.
Generally speaking, the CPU clock has the highest performance weighting of everything we have done up to this point. For example, a 2.4GHz overclock with the memory running at 300MHz may very well be outperformed by a 2.5GHz system, with 200MHz memory. Ideally, we'd want to have the best of both worlds, maximizing our Memory and CPU clock. As you could see in the previous section, the IMC can often prohibit us from obtaining that golden balance between CPU and memory, but we'll work with what is possible to obtain the best performance.
To further complicate things, a weak IMC may begin exhibiting problems at higher CPU clocks, even though it appeared fine in the previous section. A 300MHz memory clock may be perfectly fine at 2.1GHz, but at 2.6GHz, the behaviour may be different altogether.
By this stage, it is more likely that you will see failures as a result of the increased memory subsystem stress. Most poor IMCs will begin exhibiting problems even without a high CPU clock, but if yours is 'on the fence' at all, combining high CPU and memory clocks can cause failure to occur at this stage of the game.
What we are going to do in this section:
- Select a good, safe maximum CPU clockspeed
- Analyze combinations of CPU multiplier, reference clock, and memory divider to determine to best combination to achieve the maximum CPU clock speed and best memory performance.
- Combine both the high CPU clock and memory clock and begin long-term stability testing.
- If a failure occurs, reduce the memory clock speed, or loosen the timings and try not to sacrifice the CPU clock speed if possible.
- If you continue to see instability even after loosening timings or reducing the memory frequency, it could be a result of your CPU, in which case you should reduce your reference clock speed by small increments, or increase the vcore slightly (if it is not already increased significantly).
3500+ Test System
Let's take a look at the 3500+ system first. We have quite a few values that we can work with to decide upon a 'Maximum CPU Clock'. To compile the chart below, I simply selected the highest 'stable' CPU clocks for each vcore increase that was made. I think the most logical choice to shoot for would be the 2618MHz overclock that we obtained at only 0.05V above default vcore. It is really not worth the increase in temperature/voltage to obtain an extra ~60MHz.
| CPU CLK | LOAD TEMP | VCORE |
| 2530MHz | 39'C | DEFAULT |
| 2618MHz | 44'C | 1.45V |
| 2651MHz | 53'C | 1.55V |
| 2684MHz | 56'C | 1.60V |
That being said, there are several different combinations of reference clocks and CPU multipliers that we can use to obtain ~2.6GHz. We can also use different memory dividers to obtain varying memory clock speeds.
| CPU Multi | Ref Clk | CPU Clk | Memory Dividers | ||
| 1:1 | 5/6 | 2/3 | |||
| 11 | 238 | 2618MHz | 238MHz |
187MHz | 154MHz |
| 10 | 261 | 2610MHz | 261MHz |
218MHz | 174MHz |
| 9 | 290 | 2610MHz | 290MHz | 237MHz |
186MHz |
| 8 | 327 | 2616MHz | 327MHz | 261MHz | 218MHz |
Anything lower than 8x begins to get rather useless with this 3500+ and requires very high reference clock speeds to achieve 2.6GHz.
Looking at the above chart, I immediately recognized that 10x261 was perfect for the BH-5, as it had no issues pumping out 261MHz at 2-2-2-5 timings, and this is very close to it's maximum clock speed at the voltages I tested with. If this proves to be stable, we are literally getting the best of both worlds.
In a perfect world where every chip has a perfect IMC, 9x290 would have been a great choice for my TCCD. Unfortunately, 290MHz is not going to happen with my Winchester and 1GB of RAM, so I will likely have to use 11x238MHz and take advantage of the tighter 2-3-3-6 timings it is capable of at that speed. I could also use 10x261MHz for the TCCD but it only proved stable with rather loose 2.5-4-4-8 timings. 11x238MHz at 2-3-3-6 timings is still a better choice. Hopefully this will prove to be stable with a 2.6GHz CPU clock.
There are also two results that could work using the 5/6 multiplier to achieve similar memory clocks, however this is rather useless, as it is makes more sense to use lower reference clocks to achieve the same result.
I'm going to select the two below results for 'long term' stability testing.
| CPU Multi | Ref Clk | CPU Clk | Memory Type | Memory Dividers | ||
| 1:1 | 5/6 | 2/3 | ||||
| 11 | 238 | 2618MHz | 2x512MB TCCD | 238MHz |
187MHz | 154MHz |
| 10 | 261 | 2610MHz | 2x256MB BH-5 | 261MHz |
218MHz | 174MHz |
Sempron 2600+ Test System
What about the Sempron 2600+ system and the Kingston Value RAM?
Based on the three vcore increases we did, the below three values are what we have to work with. This chip scaled a little better with vcore than the 3500+, and 1.55V yielded slightly high but acceptable temperatures. Since this is a rather inexpensive chip, I think it's worth shooting for ~2.3GHz at 1.55V.
| CPU CLK | LOAD TEMP | VCORE |
| 2160MHz | 46'C | DEFAULT |
| 2280MHz | 51'C | 1.50V |
| 2336MHz | 53'C | 1.55V |
One thing that is different with this chip is the fact that it is completely multiplier 'locked'. We have no choice but to use the 8x CPU multiplier, so this really limits what we can do with the memory.
| CPU Multi | Ref Clk | CPU Clk | Memory Type | Memory Dividers | ||
| 1:1 | 5/6 | 2/3 | ||||
| 8 | 288 | 2304MHz | 1x512MB Value RAM | 288MHz | 230MHz | 192MHz |
With the high 288MHz reference clock required to achieve 2.3GHz, we will be unable to run this memory 1:1. Using the 5/6 divider however, 230MHz yields a decent clockspeed that should work well with the Kingston Value RAM. In the previous section, we determined that our value RAM could do 230MHz at 3-3-3-8 timings without issue.
Let's put an end to the stability question and see how these systems react to peak memory and CPU clocks.
Long Term Stability Testing
We have now arrived at the long, boring part of the process. You'll have to force yourself to resist launching your favourite game and running benchmarks at this point. Stay focused! you are almost finished!
Just how do you define stability? Some people consider 5 minutes of Prime95 sufficient, but not I. Twenty-four hours is the magic number, and if you have even more patience than I do, 48 hours or longer is ideal.
I have often received comments from people stating "Whenever my CPU is unstable, I get a failure within the first 2-3 hours, any longer than that and it is guaranteed to keep going". Below is a screenshot I took when pushing my Opteron 148 to its limits. It primed for over 24 hours before the first error popped up. You can imagine my disappointment.
Needless to say, the longer you test, the better. For the purposes of this guide, I'm going to recommend 24 hours, especially if you plan to use this configuration 24/7.
Just what type of testing should be done? Prime95 is my favourite, and it has yet to let me down. I recommend the 'Custom' test I outlined in the tools section. Be sure to specify a large chunk of your available physical memory to be tested. This will heavily stress your CPU and Memory. Your CPU will warm up substantially, and there will also be a heavy electrical load on your system during the testing. It is not necessary to rerun SuperPI or Memtest86+ tests at this point. Remember to keep an eye on your temperatures and voltages!
Once that is finished, I recommend playing a long run of your favourite 3D game. Fast action FPS games, such as anything based on the Valve 'Source' engine, are a great stability test. 3D testing puts a further burden on your electrical system, and if your system is sitting 'on the fence' at all, it can push it over the edge. Common failures during gaming include Windows 'exception' errors, and BSOD failures. If you are not into gaming but still have a 3D card, running loops of 3Dmark 2001SE is also a great test.
TIP: You can still surf the web, check your email, etc. while running a Prime95 test. Avoid using any applications that use a lot of physical memory, or you'll find Prime95 testing the page file on your hard drive rather than your physical memory.
Let's test our first configuration. I decided to use 237MHz rather than 238MHz, as it gives a nice even 2.6GHz.
| CPU Multi | Ref Clk | CPU Clk | Memory Dividers | ||
| 1:1 | 5/6 | 2/3 | |||
| 11 | 237 | 2607MHz | 237MHz |
N/A | N/A |
Success! It appears that this is a 24 hour Prime95 stable system configuration. I followed up with about 3 solid hours of 'Day of Defeat: Source' which ran flawlessly and smooth as butter. I was a little worried that the tight 2-3-3-6 timings would not play well with this weak IMC, but the lower 237MHz clock speed helped to offset that.
Let's get that BH-5 cooking now.
| CPU Multi | Ref Clk | CPU Clk | Memory Type | Memory Dividers | ||
| 1:1 | 5/6 | 2/3 | ||||
| 10 | 261 | 2610MHz | 2x256MB BH-5 | 261MHz |
N/A | N/A |
No problem! This did not come as a big surprise, because the IMC on this chip had no issues with 2x256MB DIMMs. I was pleased to see that even with the CPU at 2.6GHz I was able to maintain stability.
So what about the 2600+ Sempron?
| CPU Multi | Ref Clk | CPU Clk | Memory Type | Memory Dividers | ||
| 1:1 | 5/6 | 2/3 | ||||
| 8 | 288 | 2304MHz | 1x512MB Value RAM | N/A | 230MHz | N/A |
Success! This chip also had a very weak IMC, even with a single 512MB DIMM. I was still able to get a respectable 230MHz overclock out of value RAM, not to mention a 700MHz CPU overclock for an almost 45% gain. Not too shabby for AMD's bottom of the line.
Well, after all of that hard work and anticipation, we finally have some concrete results. Most impressive is the Sempron 2600+, which was able to increase its clock speed by almost 45%. Let's take a look at some performance figures, and let the benchmarking begin!
Benchmarks
So what exactly does 45% of extra CPU clockspeed mean in the real world? Let's find out. Your system may not always 'feel' faster when overclocked, but I can almost guarantee that it is. One sure fire way to measure just how much faster is to run benchmarks. I'm going to be putting these systems to the test with both synthetic and 'real world' benchmarks.
Benchmark #1: SuperPI 1M
The SuperPI 1M test is a very CPU/memory intensive benchmark, and the amount of L2 cache available also plays a large role in the overall score.
As you can see, there was a five second decrease with the 3500+ system and a whopping fifteen second decrease with the Sempron system. Impressive improvements so far!
Benchmark #2: Sisoft Sandra 2005, CPU Arithmetic Benchmark
Sandra 2005 offers quite a range of benchmarks. The CPU Arithmetic benchmark is great for measuring raw CPU processing power. As can be seen below, the overclocked Sempron 2600+ actually outperformed the stock 3500+, even with one quarter the cache memory and higher latency memory. This really shows that CPU clockspeed has a much heavier performance weighting than other metrics in these number crunching benchmarks.
Benchmark #3: Sisoft Sandra 2005, Memory Bandwidth
The Sandra '05 Memory Bandwidth benchmark is a good 'overall' memory performance measurement. CPU clock speed does play a large role in the scores here, because the IMC runs at the same speed. Tighter memory timings also improve memory bandwidth. As you can see below, the single channel memory controller on the socket 754 Sempron produces roughly one half the bandwidth of the dual channel 939 3500+. This single vs. dual channel metric clearly plays a large role in this synthetic memory benchmark, but in 'real world' benchmarks that equates to a much smaller overall performance decrease.
No real surprises here. The 261MHz BH-5 clearly came out on top with almost 7400MB/s of memory bandwidth.
Benchmark #4: Everest Memory Latency
Unfortunatly, Lavalys has discountinued support for 'Everest Home Edition'. The Everest latency benchmark is fairly unique. Much like bandwidth, the overall memory latency is impacted by CPU clock speed and memory timings. It is sometimes interesting to see memory performance from the perspective of latency rather than bandwidth. A64's all have relatively low memory latency thanks to the IMC. When it comes to memory latency, even a 2600+ Sempron outperforms a Pentium 4 'Extreme Edition' by a large margin.
Again, no real surprises here. The BH-5 experienced a large latency decrease thanks to its very tight timings and high frequency. 36.2ns is faster than anything Everest had in its comparison database, as can be seen below.
Benchmark #5: Folding @ Home Gromacs Performance
I thought I would deviate from the usual benchmarks a bit and see what kind of improvement can be seen from F@H . For this benchmark, I simply copied a 600-point 'p1150_RIBO_Semihelixfrom1141' Gromacs WU from one of my folding machines, and allowed each system configuration a chance to crunch through a few frames. I simply took the 'Time Per Frame' from EMIII, and compiled the below chart. If you don't know what folding is, check out www.joinfolding.com and consider putting your CPU cycles to good use.
As you can see, F@H Gromacs seems to really take advantage of the extra processing power, and significant gains can be seen. Memory performance does not appear to play too much of a role in Gromacs, but I'd venture a guess that L2 cache memory does. Even though the Sempron is clocked 100MHz higher, the stock 3500+ still performs about two minutes faster thanks to four times more L2 cache memory. If you look at the 261MHz BH5 vs. the 237MHz TCCD with looser timings, there was only about a 30 second improvement.
Benchmark #6: 3Dmark 2001SE
3Dmark 2001SE is another classic that will likely never go away. Today's high-end 3D accelerators are not the bottleneck for this benchmark, so it is very heavily dependant on CPU and memory performance for high scores. Rather than looking at the overall score, I prefer to look at the average FPS for each individual test, as it gives a better breakdown of the performance gains.
Pretty much every game test, with the exception of 'Nature', gained a fairly significant boost from the extra clock speed and memory performance. Nature still puts a significant strain on the GPU and benefits little from extra CPU power.
Our Sempron system did not benefit whatsoever from a tremendous 45% CPU overclock. This simply reaffirms that if you are into gaming, you should buy the best video card you can afford. The Radeon 9250 entry-level card is clearly the bottleneck in this system.
Benchmark #7: Farcry
Farcry set the bar in 3D gaming a couple of years ago, and it is a generally demanding game. I'm going to focus only on the 3500+ system and X850 for these more GPU intensive benchmarks.
Again, a familiar pattern emerges. The more GPU intensive, the less the overclock seems to improve things.
Benchmark #8: Doom 3
Doom 3 is well known for being a demanding game as far as hardware is concerned. Let's see what kind of gains can be seen from our overclocking adventures.
At lower resolutions, nice gains are seen, and once Antialiasing and other 'eye candy' is enabled, the gain is negated due to increased demand on the graphics card.
Conclusion
I think that the performance data in the previous section speaks for itself. Just about every A64 has some untapped headroom in it, and it really is worth trying to extract. The systems I used for this guide are really nothing special, and my chips are actually sub-par in many respects. Despite these limitations, I was still able to get some impressive results. As far as stability is concerned, I wrote this entire article on the 3500+ test system at 2.6GHz, running Folding@Home, without a single hint of instability. The Sempron system sits next to it and has been successfully running Folding@Home for weeks at 2.3GHz.
I hope that, by this point, you have also had some pleasant surprises from your system and that you learned a thing or two from the article.
I would encourage anyone with questions to start a thread in the Short-Media 'Overclocking' forum. There are quite a few veteran overclockers around that have a good deal of wisdom to share.
Next step: Short-Media Forums
Previous section: Memory Clock and Timings





