Help me understand how Opterons+Hypertransport work

panzerkwpanzerkw New York City
edited January 2004 in Hardware
In the typical AMD or Intel System, there's the CPU, the FSB, and then the memory, all at which runs at a certain speed. (DDR166, 333, 400, etc).

But the New AMD64 CPU introduce something called Hypertransport, which runs at the same speed as the CPU. But how is it still using DDR333, DDR400 memory? I don't get how Hypertransport (which is a completely new and unique architechture isn't it?) would make AMD64 systems much superior to legacy AMD systems.

I'm trying to think of it this way:

Normal AMD

CPU with multiplyer -> FSB (RAM speed) -> Motherboard chipset ->RAM

AMD64

CPU with multiplyer -> FSB (same as CPU clock speed) ->Hypertransport bus -> ??? ->RAM

I'm having trouble seeing where the performance boost comes from if we're still using the same old DDR memory modules.

Excuse the ignorance of this post, but I'm just trying to get a plain english jive of what makes the AMD64+Hypertransport better.

Comments

  • profdlpprofdlp The Holy City Of Westlake, Ohio
    edited January 2004
    From TechRepublic
    HyperTransport
    This new bus standard from AMD was pioneered to replace the EV6 bus on motherboards. Since that time, it has been adopted by a number of companies for a variety of roles. At its core, HyperTransport is a scalable and variable-bandwidth bus using prioritized data packets. Most buses are designed with a time-slicing technique for sharing bandwidth. So, if you have a 100-Mb Ethernet card and a 56K modem on the same bus, the Ethernet card will get the full bandwidth for more time, and any lag is covered by data caches. Other buses use master/slave configurations in which multiple groups of devices can communicate, but the master can supercede the slave within their pairing, and no one device can use the full bandwidth of the bus.

    HyperTransport contrasts this master/slave configuration with the ability for any device to use the full bandwidth available or multiple devices to use various fractions of the total bandwidth. This ability can be reassigned dynamically based on a priority system, ensuring key components receive the bandwidth they need.

    HyperTransport does not have a specified bandwidth because the data width can be varied at manufacture. Thus, HyperTransport is a high-level bus that will typically connect other buses or systems. Motherboard manufacturers see great advantage in HyperTransport because it removes the PCI bus as the primary link for the IO system in a cost-effective manner. Expect to see HyperTransport appearing in a variety of multi-IO devices. This should help decrease the cost of PDAs, PCs, and laptops as the industry standardizes on HyperTransport.


    From Lost Circuits:
    The memory controller integrated into the Athlon64 family runs at CPU clock speed. This is in sharp contrast to memory controllers integrated on the chipset, mostly for two reasons: the chipset memory controllers operate at system bus speed, whereas the integrated memory controller on the processor runs (depending on the speed grade) 10 times as fast. Keep in mind that the actual processing speed inside of the controller does not pose a primary obstacle, if that were the case, overclocking limitations would be a simple matter of clock speed.

    Rather, it may as well be the I/O interface that is scaled down to the memory frequency. If that were the case, lowering the memory frequency or memory-to-CPU ratio should allow to increase the overall overclocking capabilities.

    Why would the memory I/O interface be a problematic step, if it has been mastered on most chipset level memory controllers to work up to some 270 MHz. A chipset is a different beast, though, and there are a number of issues to be taken into account: The command and address signals for the memory interface require a certain amount of drive strength and that translates into amperage or simply power. Power, in turn, means heat and cross talk and a bunch of other nasty things that, especially on the CPU-level can become quite cumbersome.

    The caveat here is that it is not possible to determine on the basis of a single configuration whether it is the controller or else the memory itself that poses the ceiling, however, the use of high-speed DIMMs may shed some light on that, too. Needless to say that most "high-speed" DIMMs are also specked at higher voltages and may require higher drive strength to begin with. In other words, what may appear a simple test, could turn into a Sysiphus ordeal.

    In any case, enough of theory, it is the results that count.

    The easy answer seems to be that HT separates the buses from one another, allowing each to reach its maximum potential independent of the others. In other words, it is like allowing your wide receiver to run all-out, even while your quarterback is pussy-footing behind your dog-slow offensive line. The old interdependency has been removed.

    I'm like you - still trying to understand this. There will be better answers than mine, and I'm looking forward to them. :wink:
  • ThraxThrax 🐌 Austin, TX Icrontian
    edited January 2004
    The current system bus is a half-duplex bus running at 133MHz and capable of handling 64 bits of data, 8 bytes at a time (This is the vlink bus that almost everyone uses). Half-duplex means that the processor can talk to memory, and the memory can send back data/results on the same bus, but the data can't be sent to and fro at the same time.

    HyperTransport runs at 800MHz and is "double pumped," meaning data is sent on the rise of the clock, and received on the fall of the clock, doubling the effective bus speed to 1600MHz.

    HyperTransport can be different widths (Bit wise), depending on the needs of the bus -- it can be 2, 4, 8, 16 or 32 bits wide. And it is full duplex, so it can send and receive data at the same time. The transmit and receive parts of the bus can be different sizes, depending on needs, but on a main bus, they are likely to be symmetrical. Splitting the transmitting and receiving parts of the bus helps to simplify the design and makes it easier to run the bus at higher speeds.

    It's also important to note that not only is throughput high, but latency is extremely low. If you had two buses, one that's 256 bits and 100MHz, and one that's 32 bits and 800MHz, the latter would have lower latency and thus perform faster.

    The nForce2 uses the HT bus, but used a narrower, cheaper version of it.

    HT is actually independent of the way the FSB is determined.

    The Opteron/Athlon 64 determines memory speed a bit differently than other processors. Memory speed is not determined by the FSB, but rather as a divisor of the clock speed.

    What these settings do is determine the divisor for memory. In the situation above, setting the memory at 200MHz means a divisor of 10 (Opteron 146 for example is 2000/10 = 200MHz), while setting the memory at 166MHz means a divisor of 12.

    While the mechanism is different, the end results are pretty much the same as the 5:4 and 3:2 ratios found in PIV 865/875 boards.

    The Opteron can work VERY well asynchronously, putting the P4 to shame in this regard.

    The Opteron/A64/FX are unique in the fact that the memory can run at 166MHz, or 200MHz, and the FSB on the IMC can be very high (240-300MHz stably). And the bandwidth will just shoot right through the roof.

    attachment.php?postid=193012

    240 FSB as seen there.

    How did that person do it?

    Well, actually.. I'm just going to cut through the crap and say I have no ****ing clue.. It just works.

    Decreasing your memory speed allows you to increase the FSB on the memory controller, thusly allowing you a significantly higher clock speed for both the CPU and the real internal FSB... Bandwidth goes up.

    I don't get it.. Whatever.

    Here's a picture:
    htt.gif 17.9K
  • mmonninmmonnin Centreville, VA
    edited January 2004
    Nice pictures Thrax.;)

    I cant say any more. He said more than I know. I did know that FSB is much easier to get to a higher clock without the other chip in its way.
  • panzerkwpanzerkw New York City
    edited January 2004
    Thanks for the explanation, I think I can return to the Intel vs AMD fray with some heavier weapons on another forum.
  • MediaManMediaMan Powered by loose parts.
    edited January 2004
    Does no one check the front page archives anymore. :aol:Hypertransport explained.
  • panzerkwpanzerkw New York City
    edited January 2004
    MediaMan wrote:
    Does no one check the front page archives anymore. :aol:Hypertransport explained.

    I can't remember the last time I actually went to the homepage, but thanks :sawed:
  • MediaManMediaMan Powered by loose parts.
    edited January 2004
    panzerkw wrote:
    I can't remember the last time I actually went to the homepage

    sigh.



    lol
  • RWBRWB Icrontian
    edited January 2004
    MediaMan wrote:
    Does no one check the front page archives anymore. :aol:Hypertransport explained.

    very long, very educational... thanks! I am still lost, but I now understand some of it.


    But how does the clock speed get determined? How does a A64 run at 2Ghz or whatever speed? How is it that it HAS a FSB but doesnt?

    You can still overclock by setting the FSB to higher than 200MHz, yet it has a 800/1600MHz bus? agh!?
  • Straight_ManStraight_Man Geeky, in my own way Naples, FL Icrontian
    edited January 2004
    RWB:

    FSb is Rate, so many cycles per second, like 200 Million of them.

    The 800\1600 is how many bits can travel how fast on bus at that rate In TOTAL, and is width times base rate for bus or for FSB BANDWIDTH.

    Thrax was talking about a single CPU bandwidth hyperbus, mostly, with the A64 illustration. Let's take an Opteron board, two or four socket. Though it is not called that, essentially you have anegotiable flow bandwidth bus there between the processors, thus one pari can talk at high bandwidth but same rate, and other pair might be busy and using less bandwidth at same rate because they are loading the bus with fewer bits per second of data.

    Essentially, hyperthread does something similar, what is not used in total flow by resources can be available for other things connected to that bus style. If one thing is using half the total flow capacity, or bandwidth, then half is left for all other things that use the bus. If the one thing uses 1\3 of the bandwidth, 2\3 of th bandwidth is available for use by other things. Thrax was talking total available when saying 800MHz of bit flow to 1600 MHz of BIT FLOW available total on bus.

    Instead of a half-duplex, where one end only can talk in any one time cycle (unless you use multifreq range overlays of signals, which 90% of computer busses are not too good at), we have a bus where for every 200 Million clock cycles per second (time cycles), EACH end can send once. So, say 8 bit bus..... 200 MHz bus. Bnadwidth available if busses were same width on both, for same speed, just for illustration of principle, half duplex has 800 MHz of bit flow available for each end if bus is 100% effective(typically, NOT, more like 70% efficient in reality). Hyp[erthread has, in same speed and width, 1600 MHz of bit flow available to BOTH ends in same second of time. BUT, half duplex busses are a lot slower in reality than 200 MHz, and half as flow efficient even at same speed if sturated by one flow. Hyperthread takes a bus, twice as flowable, and makes it a negotiated rate flow balance between devices sharing buson top of that, so you do not have devices starved as much for bandwidth to talk top other ICs or more major chips that are multi-area ICS and very complex (like bridges and CPUs). Typically, bridges and CPUs are given priority higher than more minor things thta run at slower rates, for pure bandwidth, but in hyperthreading's case, the spec allows for sharing ICs to negotiate rates adn thus share bandwdith more so things function overall more smoothly.

    John.
  • RWBRWB Icrontian
    edited January 2004
    OK I did not make myself clear...

    How does the _CPU_ get it's clock cycle? I know perfectly well how AXP's, P4's, and all them work, they have the Clock Mltiplyer and the FSB... thus 10x200=2000Mhz

    But since this memory controller is built in, and hearing people say various different things on it's Bus being the speed of the processor and what-not, I am trying to cocieve how the flying hell an Athlon64 gets it MHZ/GHZ. Does it, or does it NOT use a clock multiplyer and FSB speed?
  • citrixmetacitrixmeta Montreal, Quebec Icrontian
    edited January 2004
    and thats how you get 5Gb/sec at 9x200 ;)
  • JustinJustin Atlanta
    edited January 2004
    So which 64 is better, opteron or athalon?
  • citrixmetacitrixmeta Montreal, Quebec Icrontian
    edited January 2004
    Opteron/FX pwn

    940pins
    So which 64 is better, opteron or athalon?
  • mmonninmmonnin Centreville, VA
    edited January 2004
    And its Athlon
  • JustinJustin Atlanta
    edited January 2004
    Yeah, Yeah. I realized that after I typed it later and saw how funny it looked. So the opteron beats out the Athlon (see, I can spell)... Citrix, has this CPU helped you get to #1 or are there other things in play? I took a look at your specs and they look solid, just curious, where can I get a 64 copy of XP?
Sign In or Register to comment.