Gromacs sensitive to ram timing.

edited July 2003 in Folding@Home
I just discovered that Gromacs WU's are sensitive to your ram timings, unlike Tinker WU's. I was messing around with my KHA+ system yesterday evening and was reading Mack's ram timing thread when I saw the benches that Gargoyle had posted from his KHA+ system. I ran Sandra on my KHA+ system and saw that they were lower than his, so I went into bios and saw that I had ram timings set at normal. I then upped the ram timings to turbo and bumped the vdimm up to 2.8v, to give the Crucial PC2100 in that rig a chance of running it stable at a 141 fsb and Cas2 and rebooted and sure enough, my mem benches were then better than Garg's. That was late last night when I finished messing with it.

I go and check to see how that rig was running a little while ago and it was running fine and for the hell of it, I checked my EMIII log to see how it was folding and it surprised the hell out of me. The WU I was folding, a p349, was finishing 30 minutes faster than it had done before I had bumped the mem timings tighter, with no other changes to this rig.

Attached is a screenie of the p349's this rig has done; the last 2 are from after changing the mem timings tighter:

Comments

  • primesuspectprimesuspect Beepin n' Boopin Detroit, MI Icrontian
    edited July 2003
    Wow Mudd! This is a major discovery. Excellent work! :thumbsup:
  • LincLinc Owner Detroit Icrontian
    edited July 2003
    Whoa, nice work Mudd :cool::fold:
  • ThraxThrax 🐌 Austin, TX Icrontian
    edited July 2003
    I always figured RAM timings had to be related to calculations. Any number crunch the F@H instance calls for has to be run through the RAM. If the RAM can serve those requests more quickly than before, it'd benefit the entire operation.
  • profdlpprofdlp The Holy City Of Westlake, Ohio
    edited July 2003
    I found one of my folding rigs with mem timings set to "normal". (The other ones were maxxed out already).

    Well, mondi old friend, you might have to wait another day or two to pass me.:cool:

    Nice research, doc!:respect::fold:


    Prof

    (Folding as "scthoburn")
  • SimGuySimGuy Ottawa, Canada
    edited July 2003
    Would RDRAM (which has a much higher latency than DDR SDRAM) actually compromise your folding times?

    ie, would 40ns PC800 RDRAM in an Intel system fold "slower" than a DDR266/333/400 system (single channel) because of the substantial delay in retrieving data from memory?

    Would having dual-channel DDR benefit your folding at all?
  • ThraxThrax 🐌 Austin, TX Icrontian
    edited July 2003
    Theoretically speaking, having RDRAM would slow the operation down. The basic clock cycle for RDRAM was 100 and 133MHz. Just as PC1600 and PC2100 were. However, PC1600/2100 were approximately 5-8ns memories, and RDRAM was 40-58ns. As the basic clock cycle is the same, and memory bandwidth between PC2100 and PC800 is the same, I would assume that the latency on RDRAM being about 5-12x higher than that of DDR would negative impact on folding speed.

    Dual-channel, continuing on theory, would have no effect.
  • mmonninmmonnin Centreville, VA
    edited July 2003
    I am surprised, cause I didnt think it would make much of a difference. I might have to try my own experimenting.
  • edcentricedcentric near Milwaukee, Wisconsin Icrontian
    edited July 2003
    The next question is, by how much?
    Is it worth backing off on the over clock a few Hz in order to run faster mem settings?
  • TheLostSwedeTheLostSwede Trondheim, Norway Icrontian
    edited July 2003
    Take for example 2.5-4-4 as the latency rating for a module. Latency is a measure of delay, that means the 2.5 rating in 2.5-4-4 indicates a 2.5 clock cycle delay. And the 4 ratings mean a 4 clock cycle delay. The clock cycle delays that these ratings are measuring is what determine how long it takes your CPU to write or remove data from memory. So the lower these latencies are, the less time your CPU spends idle waiting for data which results in higher performance.

    The position of the rating in 2.5-4-4 determines what latency the rating is referring to. The ratings, in order, represent the latency ratings for CAS, tRCD (RAS-to-CAS delay), and tRP (RAS Precharge). It would take a long time to explain what each of these latency ratings means, so to make a long story short the lower the latency the higher the performance of your CPU.

    Mac
  • TheLostSwedeTheLostSwede Trondheim, Norway Icrontian
    edited July 2003
    I tested project 361,a gromac for a while and i couldn´t see any difference at all between 11,3,2,2 and 5,5,5,2.5 . However, im not sure whether it might change on another type of project or another chipset perhaps. Mud, can you try to re-create the scene and see if it was a fluke?

    Mac
  • GargGarg Purveyor of Lincoln Nightmares Icrontian
    edited July 2003
    muddocktor said
    I then upped the ram timings to turbo and bumped the vdimm up to 2.8v, to give the Crucial PC2100 in that rig a chance of running it stable at a 141 fsb and Cas2 and rebooted and sure enough, my mem benches were then better than Garg's. That was late last night when I finished messing with it.

    Was the system at 141 fsb before? If not, then how much faster did the CPU get when the fsb was raised?

    I'll have to mess with my timings a little and see if it helps. I can't raise my fsb right now, for fear of what would happen to controller cards :(. (I need an nforce2 real bad...)
  • edited July 2003
    This system has been running at a 141 fsb speed ever since I remapped the multi on the XP2100 in it to a 16 multi. The fsb speed and everything else has stayed the same, except for the ram timings.:)

    BTW, it just finished another p349 and it's still holding true to form; faster folding with faster ram timing.
  • GargGarg Purveyor of Lincoln Nightmares Icrontian
    edited July 2003
    Well, my results in FAH are inconclusive. They seem approximately the same, but I'm having a hard time finding times in the log that I was for sure not using it. There were parts where I was playing SWG, and that's REAL obvious in the logfile, but other parts where harder to tell.

    My SiSoft bench was also inconclusive. The float rating increased by 22, but the integer rating decresed by 12.

    Oh well, more tweaking later.
  • GargGarg Purveyor of Lincoln Nightmares Icrontian
    edited July 2003
    Grr. First IC11 loses it's WU due to the thunderstorm last night, then my rig loses one because 1T command rate was apparently too much to ask. :banghead: I was like 80% done too...
  • mmonninmmonnin Centreville, VA
    edited July 2003
    OK I was able to repeat the findings mudd has done. I went from 13:03 avverage down to 12:20 average. It was at 2.5-3-5-3 or something and now they are all 2's, 2.0-2-5-2. Not sure if that is the right order but I changes the CAS I know and two 3's to 2's.
  • TheLostSwedeTheLostSwede Trondheim, Norway Icrontian
    edited July 2003
    mmonnin, what project was that on? Same as Mud`s?
  • mmonninmmonnin Centreville, VA
    edited July 2003
    Project: 341 (Run 1, Clone 85, Gen 25)
  • TheLostSwedeTheLostSwede Trondheim, Norway Icrontian
    edited July 2003
    Almost minute a pop`s difference is MASSIVE!

    I wish there could be some official about this from Panda and/or Stanford.
Sign In or Register to comment.