Dual SMP on Quad - Guide Revised (Linux ONLY)
Ultra-Nexus
Buenos Aires, ARG
This guy, created a great guide for a dedicated Quad machine running Linux as the MAIN OS (no Windows, no Virtual Machines), inspired on my original Quad guide.
Check it out!
I will try it myself! Perhaps I can gain a couple of hundred PPD more from my setups.
Check it out!
I will try it myself! Perhaps I can gain a couple of hundred PPD more from my setups.
0
Comments
EDIT ADDED: Also, I've not seen enough folding data on the Phenom (a "true", "native" or symmetric quad-core design) to be able to say that a single SMP client produces just as much on a symmetric quad as it does two clients (or two similar dual-cores for that matter). Personally- I would think Phenom would have the opposite (slower) result than the Kentsfield running two SMP clients because it runs on a smaller, unified cache. But ... ?
Thankfully, as long as we get the WUs done within the deadline, its fine. If they want to encourage everybody to do fast folding instead of efficient folding, then they´ll have to review their point system.
But my point is (and I know you know this): It really doesn't turn around WUs faster- it cumulatively turns them around much SLOWER, so it's not WU-wise either.
Say they dont send out duplicates. Say they send out 10 WUs at once they need to have all the results combined so that the next set of 10 WUs can be sent out. Say everyone has turned theirs in while someone is running 2 clients on a quad slowing down the entire batch from going out again.
Multiple clients slow down the science. End of story.
Still though. I can run SMP with two clients per machine and still turn in both units each with 73+% of the allotted time remaining. I just won't buy it that that is slowing down the project. There a couple hundred thousand machines out there turning in units much more slowly.
In my mind, this mini discussion is not off topic. It applies to dual clients under Linux, Windows, or OSX.
Let me see- so really, my son's rig which is and has been running a single client on a 2.7GHz Q6600 is, in probablility, PWNING YOU ALL!
I think I can live with that .
Granted a Q6600 can still help out more with 2 clients, especially OCd nicely as it is just 2x E6600 CPUs in 1 package (my Q66 actually folds faster than my faster clocked E66) than with 1 client alone. Mosy likely you wont be the last person to send in the needed WU before the next batch can go out, especially with P-Ds/Athlon Duals out there running SMP.
That and the quads, even running dual client, are among the faster machines at Folding's disposal, as has been said. So, I think it's reasonable to leave mitigating the risk of losing time in simulations to Stanford, and just focus on increasing PPD; And of course, if they wanted to change how we focus our efforts (say, if they want us to get WUs done as soon as is possible), then the path is quite clear, if they change the scoring system to emphasize getting things done quickly, we'll reorganize to meet them.
(And I do agree with Leonardo; specificly I think that when you can run 2 simulations in parallel in less time than it would take to run 2 sequentially, it's more beneficial to have the 2. Of course unless Stanford comes out and says so, I guess we don't know what they prefer.)
I'd also guess that whoever gets a WU done first is not necessarily "the winner" either. If they've sent multiple copies of the same WU out, they likely use their results to error-check each other and have voted results.
FAH is also a simulation- so there are probably lots of other analyses going on in background like Monte Carlos, etc. Despite the fact they now have over a Petaflop at their disposal, it's still pretty meager for what they want to do, so you can bet they spend a considerable amount of time determining what will give them the most result.
If I read it right, DCPs like FAH are horribly inefficient, but it still affords Stanford way more computing power than they can afford on their own. You have to give them credit- there is almost as good amount of genius in marketing FAH as there is in doing the math.
And yes FAH is simulating more than 1 protein, more than 1 part of the simulation per protein and probably multiple types of simulations on the same protein.
I dont know how much of the multiple copy thing they do but I would if I was Stanford and I wanted my top priority done. Remember once they send it out, you can get another WU on the same client and Stanford won't know the difference. It just wont come back to them in the allowed time and it will get resent after the deadline rolls around.
I'm not so sure- I've seen discussion about SMP not scaling very well. But I also haven't really seen any numbers on a Barcelona along these lines. I don't think I would trust the results anyhow. Bacelona's cache is, quite frankly, anemic compared to a Kentsfield/Yorksfield and two clients on that chip would likely be doomed by not having enough resource. A better guess might be against equivalently clocked duals.
?
Cache is creally benefitial for SMP. The E4XXX series is crippled compared to the E6XXX series at similar clock speeds. The same could apply to AMDs quad altho there is a possability of shared vs separate cache changing performance. I wouldnt bet on a comparison between AMDs true quad and Intels 2xDual cores on a chip as far as True vs 2xdual. We will never really know as Penyrn will have improved FPU performance, something FAH really likes!