Dual SMP on Quad - Guide Revised (Linux ONLY)

Ultra-NexusUltra-Nexus Buenos Aires, ARG
edited December 2007 in Folding@Home
This guy, created a great guide for a dedicated Quad machine running Linux as the MAIN OS (no Windows, no Virtual Machines), inspired on my original Quad guide.

Check it out! :D

I will try it myself! Perhaps I can gain a couple of hundred PPD more from my setups.

Comments

  • QeldromaQeldroma Arid ZoneAh Member
    edited December 2007
    I'm glad to see this discussion, but I fail to follow a rationale. Why would you want a quad to have a single client get one WU done in less time when you can get 2 clients on it to get TWO WUs done faster than it can? And so on? Maybe I'm not following it right. Sigh.

    EDIT ADDED: Also, I've not seen enough folding data on the Phenom (a "true", "native" or symmetric quad-core design) to be able to say that a single SMP client produces just as much on a symmetric quad as it does two clients (or two similar dual-cores for that matter). Personally- I would think Phenom would have the opposite (slower) result than the Kentsfield running two SMP clients because it runs on a smaller, unified cache. But ... ?
  • Ultra-NexusUltra-Nexus Buenos Aires, ARG
    edited December 2007
    That was always Stanford´s preference, to have the WUs returned as fast as possible, usually not point wise. Thats similar to the discussion about having two uniprocessor clients running on a HT enabled P4... for us, its better PPD, for them, its 2 WUs returned slowly, than one done as fast as possible.

    Thankfully, as long as we get the WUs done within the deadline, its fine. If they want to encourage everybody to do fast folding instead of efficient folding, then they´ll have to review their point system.
  • QeldromaQeldroma Arid ZoneAh Member
    edited December 2007
    That was always Stanford´s preference, to have the WUs returned as fast as possible, usually not point wise....

    But my point is (and I know you know this): It really doesn't turn around WUs faster- it cumulatively turns them around much SLOWER, so it's not WU-wise either.
  • mmonninmmonnin Centreville, VA
    edited December 2007
    But they need your WU to send out the next WU. Thats the point. FAH is still COMPLETELY SEQUENTIAL! They just dont send out each stop of the folding to a bunch of people and just hope they come back cause they would never get anywhere like that. They send out the same data to multiple people. Whomever gets it back to them first helps the cause the most. If you and 2 other people start up at the same time. You get 2 WUs on your Quad and they get the duplicate of each of your 2 on their quad, they will return their WUs in quicker before yours. Potenially you didn't help Stanford at all because your WUs were turned in after theirs.

    Say they dont send out duplicates. Say they send out 10 WUs at once they need to have all the results combined so that the next set of 10 WUs can be sent out. Say everyone has turned theirs in while someone is running 2 clients on a quad slowing down the entire batch from going out again.

    Multiple clients slow down the science. End of story.
  • LeonardoLeonardo Wake up and smell the glaciers Eagle River, Alaska Icrontian
    edited December 2007
    Thank you! That is the FIRST TIME I have seen it logically explained.

    Still though. I can run SMP with two clients per machine and still turn in both units each with 73+% of the allotted time remaining. I just won't buy it that that is slowing down the project. There a couple hundred thousand machines out there turning in units much more slowly.

    In my mind, this mini discussion is not off topic. It applies to dual clients under Linux, Windows, or OSX.
  • mmonninmmonnin Centreville, VA
    edited December 2007
    Yes our quads running at 4k PPD do help more than 1 P4 HT machine at 1k PPD because then we would be turning in the same WU before their SMP WU and would be doing another at the same time. The above comparison is for similar machines.
  • QeldromaQeldroma Arid ZoneAh Member
    edited December 2007
    Leonardo wrote:
    Thank you! That is the FIRST TIME I have seen it logically explained.
    Agreed- likewise, thanks :thumbsup:

    Let me see- so really, my son's rig which is and has been running a single client on a 2.7GHz Q6600 is, in probablility, PWNING YOU ALL!

    I think I can live with that ;D .
  • mmonninmmonnin Centreville, VA
    edited December 2007
    It just all goes back to FAH being sequential. Its the same reason why beowulf clusters would never work, 1 frame needed to be done before the next frame can be started. A molecule goes from point A to point B to point C. Point C cannot be calculated until you have point B.

    Granted a Q6600 can still help out more with 2 clients, especially OCd nicely as it is just 2x E6600 CPUs in 1 package (my Q66 actually folds faster than my faster clocked E66) than with 1 client alone. Mosy likely you wont be the last person to send in the needed WU before the next batch can go out, especially with P-Ds/Athlon Duals out there running SMP.
  • sgstairsgstair Reverse Engineer Redmond, WA Icrontian
    edited December 2007
    There is another side to it- and the reason FAH works on such a massively distributed scale is that for each project they are running a huge number of simulations in parallel - so yes, each simulation does need to be sequential, but they plan on having at least some of the simulations take sub-optimal paths in the process of being done, and they really need most or all of them done to draw useful conclusions.
    That and the quads, even running dual client, are among the faster machines at Folding's disposal, as has been said. So, I think it's reasonable to leave mitigating the risk of losing time in simulations to Stanford, and just focus on increasing PPD; And of course, if they wanted to change how we focus our efforts (say, if they want us to get WUs done as soon as is possible), then the path is quite clear, if they change the scoring system to emphasize getting things done quickly, we'll reorganize to meet them.
    (And I do agree with Leonardo; specificly I think that when you can run 2 simulations in parallel in less time than it would take to run 2 sequentially, it's more beneficial to have the 2. Of course unless Stanford comes out and says so, I guess we don't know what they prefer.)
  • QeldromaQeldroma Arid ZoneAh Member
    edited December 2007
    If I thought the math at Stanford was so linear, I'd save the money and send my kids to community college.

    I'd also guess that whoever gets a WU done first is not necessarily "the winner" either. If they've sent multiple copies of the same WU out, they likely use their results to error-check each other and have voted results.

    FAH is also a simulation- so there are probably lots of other analyses going on in background like Monte Carlos, etc. Despite the fact they now have over a Petaflop at their disposal, it's still pretty meager for what they want to do, so you can bet they spend a considerable amount of time determining what will give them the most result.

    If I read it right, DCPs like FAH are horribly inefficient, but it still affords Stanford way more computing power than they can afford on their own. You have to give them credit- there is almost as good amount of genius in marketing FAH as there is in doing the math.
  • mmonninmmonnin Centreville, VA
    edited December 2007
    Remember the entire reason we use 2 clients on an Intel quad is because it is not a true quad core. With true quad cores I dont think we will see the need for 2 clients.

    And yes FAH is simulating more than 1 protein, more than 1 part of the simulation per protein and probably multiple types of simulations on the same protein.

    I dont know how much of the multiple copy thing they do but I would if I was Stanford and I wanted my top priority done. Remember once they send it out, you can get another WU on the same client and Stanford won't know the difference. It just wont come back to them in the allowed time and it will get resent after the deadline rolls around.
  • QeldromaQeldroma Arid ZoneAh Member
    edited December 2007
    mmonnin wrote:
    Remember the entire reason we use 2 clients on an Intel quad is because it is not a true quad core.

    I'm not so sure- I've seen discussion about SMP not scaling very well. But I also haven't really seen any numbers on a Barcelona along these lines. I don't think I would trust the results anyhow. Bacelona's cache is, quite frankly, anemic compared to a Kentsfield/Yorksfield and two clients on that chip would likely be doomed by not having enough resource. A better guess might be against equivalently clocked duals.

    ?
  • mmonninmmonnin Centreville, VA
    edited December 2007
    The SMP client is designed for quads, thusly 4 fahcores running. We use 2 clients because current quads are not true quad cores.

    Cache is creally benefitial for SMP. The E4XXX series is crippled compared to the E6XXX series at similar clock speeds. The same could apply to AMDs quad altho there is a possability of shared vs separate cache changing performance. I wouldnt bet on a comparison between AMDs true quad and Intels 2xDual cores on a chip as far as True vs 2xdual. We will never really know as Penyrn will have improved FPU performance, something FAH really likes!
Sign In or Register to comment.