I suppose my question is why would I choose to build something like this over using a GPU?
https://news.ycombinator.com/item?id=4702456
In terms of raw GFLOPS, you shouldn't.
Parallela offers something halfway between a multi-CPU cluster, differing in terms of memory access, and a GPU, differing in that each core is a real core and can be independently making calls, branching, etc.
Another point to note is that every GPU architecture is different and that some support a different degree of control flow parallelism vs. data parallelism.
Ultimately it depends on the kernel you're working with - if you're to a point where you've got strictly linear SIMD and you depend entirely on floating-point math throughput or memory bandwidth, it wouldn't make sense not to use a GPU instead.
Between ARM, OpenPower and Intel's Phi successors this is becoming a hyper-competitive space. Interesting times ahead!
Compared with your GPU, there's also another important factor: These are entirely independent cores. For stuff that your GPU does well - that is, stuff where you're doing the exact same manipulation of a large number of values in parallel - it's likely to keep beating the epiphany chips. For stuff it doesn't do well, that is, stuff where you need indpendent threads of execution for each data stream, the Epiphany chips may become a better fit.
It has 8 FPGAs on there too. Useful for highspeed IO and other tasks. Also, there are 8 HDMI outputs on there. Which would make this a pretty interesting video wall machine.
If it had the 64 core chips instead of the 16 core chips, this thing would be ~ 800 GFLOPS I guess.
Yes, these are awesome, full-featured models, but the differences between this and a supercomputer, which costs tens of millions of dollars, requires high-density power and cooling, features multi-dimensional, low-diameter networks, and contains hundreds of thousands to millions of compute cores is... quite vast.
The cylinder design he's using is inspired by the early Cray models. Cray 1 had a performance of 80 MFLOPS. Cray X-MP had a performance of 800 MFLOPS. The Cray 2 (which looked substantially different) reached 1.9 GFLOPS in 1985.
1993-1996, Numerical Wind Tunnel - a 140 CPU vector computer, was at the top most of the time. It reached it's all time peak at around 235.8 GFLOP/s
Even ASCII Red, which held the top spot until the end of the 20th century, only reached 1.3 TFLOPS.
So unlike if you built a model of a Ferrari at home, this thing actually substantially outperforms the fastest supercomputers up until the mid 90's.
I would be quite interested to see the temperatures this gets up to.
Units have been known to overheat with just passive cooling, and they even advise that you install a fan with the official case (that they sell on their store/provided to backers), even though there is nowhere to screw a fan in, etc.
Cases and Cooling: http://www.parallella.org/2014/04/30/cases-and-cooling/ (parallella.org) April 30, 2014
The point that a cluster of parallellas is relatively easy to cool is definitely aided by the form factor.
http://webcache.googleusercontent.com/search?q=cache:http://...
The very notion of a 'general purpose processing supercomputer' essentially conjures one of those modern data center visions: a large array of identical consumer-grade hardware, with its high price/performance ratio: accessioned, wired, tested, commissioned, allocated workloads, managed over time and finally decomissioned by a combination of carefully developed human procedures and highly automated processes?
For instance, nobody in their right mind would install an OS on every such node by hand: it has to be PXE or similar (can you boot root-on-iSCSI direct from BIOS these days?).
I'm curious how many fellow HN'ers out there are doing this.