ArsTechnica offers in-depth look at IBM's CELL processor
primesuspect
Beepin n' BoopinDetroit, MI Icrontian
ArsTechnica's Jon "Hannibal" Stokes is at the International Solid State Circuits Conference in Chicago today, and he has written the first of a two part article that cuts through the media hype and really divulges some interesting details about the Cell architecture. Jon has a reputation of delivering articles that aren't "dumbed down" and it is looking like his coverage of the Cell is going to meet that standard. Check it out.
Source: ArsTechnica
Source: ArsTechnica
0
Comments
Thing is with this new Cell Processor all the burnden of making the processor efficient is put on the programmer. The memory that replaces the cache, and some other things are done by the programmer. Sure this is fine for the PlayStation3 where only a few companies will be writting code for this new processor but this isnt going to work for a desktop CPU. I dont think its going to use the same assembly language Intel based CPUs use so its not going to be very efficient on the desktop. It would require a lot of training for programmers to fully take advantage of this new Cell Processor. The code will only be as efficient as the compiler. And from some of the ways I have seen compilers take C code and translate it into asmebly, there will be a waste if it comes to the desktop.
Overall there are some good things and they are trying to move parts of the CPU in the right direction but its going to be different than out current desktop CPUs. I will definetely be reading partII.
Even if the article didn't mention that and I just somehow thought it did, I still think it would be possible. Stick an Athlon in there as an arbitrator and that's one screaming F@H machine. All the Athlon (modified, of course so it wouldn't be a true Athlon, but it would understand x86/x86-64) would need to do is send the instructions to the cell processor in the format the cell understands and then take what the cell gives it and do whatever needs to be done with the data. I know I'm dramatically over-simplifying this, but I know it's possible. But, anyway, if that's done then programmers won't need to learn anything and neither would compilers need to be optimized. Sure, it's an added layer and therefore will slow things down, but oh well.
SIMD - Single Instruction, Multiple Data:
Simple example, you want to do the following:
x1 + y1 = z1
x2 + y2 = z2
SIMD allows you to do both using one instruction, instead of two. This is likely implemented for +/-/*, possibly divide.
RISC - Reduced Instruction Set Computer:
The instruction set will do only basic functions. CISC (complex...) usually involve lots of instructions, some of which do multiple simple steps at once.
ISA - Instruction Set Architecture:
The set of instructions which a processor can decode and then execute.
these chips are designed for many many applications (depending on the number of cells you have) including workstations, clusters, PDAs, cell phones (that one struck me as odd and made no sense whatsoever). the only real problem is that vector based computers are dying out quite quickly (check out the top500 of years past and notice the trend).
They have a PPC front end which is simply an arbiter telling each cell what to do, so a2jfreak, start bugging Stanford to get their hands on a PS3 dev kit ot start cranking away on F@H version.
All i have to say is that this is quite the remarkable chip and doesn't really follow many of the traditional constructs that most modern CPUs go by (then again, niether did the marchitecture of the P4).
RISC and CISC are ISA specific, not a discription of the marchitecture (microarchitecture). ISA != implementation (you learn this when a large professor from Puerto Rico repeats it at least 3 times daily). The P4 and athlons are all risc-y inside (jsut ask the CTO of AMD, he preaches it for god sakes). When you hear 'trace cache' when talking about the p4, thats just a cache of the risc instructions.
side note: in the local newspaper (quite the rag on their best of days) there was an artical in the business saying that the >4ghz that the cells operate at will mean that it would be a 'little' faster than the p4s which top out at 3.8ghz. god i hate stupid people.
That's what I thought I read. If that's the case, then (theoretically) there is no need to modify code since a PPC version of F@H exists.
It contains one main processor which is based on the IBM POWER architecture, and 8 vector processors which are in essence stand alone processors. The vector processors have their own local memory as well as access to the main memory, and the main processor has a "cache" which is accessable by all.
As it stands this chip can currently run standard PowerPC apps without using the APU's very efficiently (obviously any "smart" compiler will vectorise components of the code that it can) as not all the code will be farmed out to them. But, programmers will have to learn to adapt their current programming techniques to use the APU's properly.
All in all it is a very exciting innovation and I can see a vast improvement in what can be done on a computer when this baby finally hits the desktops ...