Help me understand how Opterons+Hypertransport work
In the typical AMD or Intel System, there's the CPU, the FSB, and then the memory, all at which runs at a certain speed. (DDR166, 333, 400, etc).
But the New AMD64 CPU introduce something called Hypertransport, which runs at the same speed as the CPU. But how is it still using DDR333, DDR400 memory? I don't get how Hypertransport (which is a completely new and unique architechture isn't it?) would make AMD64 systems much superior to legacy AMD systems.
I'm trying to think of it this way:
Normal AMD
CPU with multiplyer -> FSB (RAM speed) -> Motherboard chipset ->RAM
AMD64
CPU with multiplyer -> FSB (same as CPU clock speed) ->Hypertransport bus -> ??? ->RAM
I'm having trouble seeing where the performance boost comes from if we're still using the same old DDR memory modules.
Excuse the ignorance of this post, but I'm just trying to get a plain english jive of what makes the AMD64+Hypertransport better.
But the New AMD64 CPU introduce something called Hypertransport, which runs at the same speed as the CPU. But how is it still using DDR333, DDR400 memory? I don't get how Hypertransport (which is a completely new and unique architechture isn't it?) would make AMD64 systems much superior to legacy AMD systems.
I'm trying to think of it this way:
Normal AMD
CPU with multiplyer -> FSB (RAM speed) -> Motherboard chipset ->RAM
AMD64
CPU with multiplyer -> FSB (same as CPU clock speed) ->Hypertransport bus -> ??? ->RAM
I'm having trouble seeing where the performance boost comes from if we're still using the same old DDR memory modules.
Excuse the ignorance of this post, but I'm just trying to get a plain english jive of what makes the AMD64+Hypertransport better.
0
Comments
From Lost Circuits:
The easy answer seems to be that HT separates the buses from one another, allowing each to reach its maximum potential independent of the others. In other words, it is like allowing your wide receiver to run all-out, even while your quarterback is pussy-footing behind your dog-slow offensive line. The old interdependency has been removed.
I'm like you - still trying to understand this. There will be better answers than mine, and I'm looking forward to them.
HyperTransport runs at 800MHz and is "double pumped," meaning data is sent on the rise of the clock, and received on the fall of the clock, doubling the effective bus speed to 1600MHz.
HyperTransport can be different widths (Bit wise), depending on the needs of the bus -- it can be 2, 4, 8, 16 or 32 bits wide. And it is full duplex, so it can send and receive data at the same time. The transmit and receive parts of the bus can be different sizes, depending on needs, but on a main bus, they are likely to be symmetrical. Splitting the transmitting and receiving parts of the bus helps to simplify the design and makes it easier to run the bus at higher speeds.
It's also important to note that not only is throughput high, but latency is extremely low. If you had two buses, one that's 256 bits and 100MHz, and one that's 32 bits and 800MHz, the latter would have lower latency and thus perform faster.
The nForce2 uses the HT bus, but used a narrower, cheaper version of it.
HT is actually independent of the way the FSB is determined.
The Opteron/Athlon 64 determines memory speed a bit differently than other processors. Memory speed is not determined by the FSB, but rather as a divisor of the clock speed.
What these settings do is determine the divisor for memory. In the situation above, setting the memory at 200MHz means a divisor of 10 (Opteron 146 for example is 2000/10 = 200MHz), while setting the memory at 166MHz means a divisor of 12.
While the mechanism is different, the end results are pretty much the same as the 5:4 and 3:2 ratios found in PIV 865/875 boards.
The Opteron can work VERY well asynchronously, putting the P4 to shame in this regard.
The Opteron/A64/FX are unique in the fact that the memory can run at 166MHz, or 200MHz, and the FSB on the IMC can be very high (240-300MHz stably). And the bandwidth will just shoot right through the roof.
240 FSB as seen there.
How did that person do it?
Well, actually.. I'm just going to cut through the crap and say I have no ****ing clue.. It just works.
Decreasing your memory speed allows you to increase the FSB on the memory controller, thusly allowing you a significantly higher clock speed for both the CPU and the real internal FSB... Bandwidth goes up.
I don't get it.. Whatever.
Here's a picture:
I cant say any more. He said more than I know. I did know that FSB is much easier to get to a higher clock without the other chip in its way.
I can't remember the last time I actually went to the homepage, but thanks
sigh.
lol
very long, very educational... thanks! I am still lost, but I now understand some of it.
But how does the clock speed get determined? How does a A64 run at 2Ghz or whatever speed? How is it that it HAS a FSB but doesnt?
You can still overclock by setting the FSB to higher than 200MHz, yet it has a 800/1600MHz bus? agh!?
FSb is Rate, so many cycles per second, like 200 Million of them.
The 800\1600 is how many bits can travel how fast on bus at that rate In TOTAL, and is width times base rate for bus or for FSB BANDWIDTH.
Thrax was talking about a single CPU bandwidth hyperbus, mostly, with the A64 illustration. Let's take an Opteron board, two or four socket. Though it is not called that, essentially you have anegotiable flow bandwidth bus there between the processors, thus one pari can talk at high bandwidth but same rate, and other pair might be busy and using less bandwidth at same rate because they are loading the bus with fewer bits per second of data.
Essentially, hyperthread does something similar, what is not used in total flow by resources can be available for other things connected to that bus style. If one thing is using half the total flow capacity, or bandwidth, then half is left for all other things that use the bus. If the one thing uses 1\3 of the bandwidth, 2\3 of th bandwidth is available for use by other things. Thrax was talking total available when saying 800MHz of bit flow to 1600 MHz of BIT FLOW available total on bus.
Instead of a half-duplex, where one end only can talk in any one time cycle (unless you use multifreq range overlays of signals, which 90% of computer busses are not too good at), we have a bus where for every 200 Million clock cycles per second (time cycles), EACH end can send once. So, say 8 bit bus..... 200 MHz bus. Bnadwidth available if busses were same width on both, for same speed, just for illustration of principle, half duplex has 800 MHz of bit flow available for each end if bus is 100% effective(typically, NOT, more like 70% efficient in reality). Hyp[erthread has, in same speed and width, 1600 MHz of bit flow available to BOTH ends in same second of time. BUT, half duplex busses are a lot slower in reality than 200 MHz, and half as flow efficient even at same speed if sturated by one flow. Hyperthread takes a bus, twice as flowable, and makes it a negotiated rate flow balance between devices sharing buson top of that, so you do not have devices starved as much for bandwidth to talk top other ICs or more major chips that are multi-area ICS and very complex (like bridges and CPUs). Typically, bridges and CPUs are given priority higher than more minor things thta run at slower rates, for pure bandwidth, but in hyperthreading's case, the spec allows for sharing ICs to negotiate rates adn thus share bandwdith more so things function overall more smoothly.
John.
How does the _CPU_ get it's clock cycle? I know perfectly well how AXP's, P4's, and all them work, they have the Clock Mltiplyer and the FSB... thus 10x200=2000Mhz
But since this memory controller is built in, and hearing people say various different things on it's Bus being the speed of the processor and what-not, I am trying to cocieve how the flying hell an Athlon64 gets it MHZ/GHZ. Does it, or does it NOT use a clock multiplyer and FSB speed?
940pins