I Build Supercomputers in My Spare Time (opens in new tab)

(parallella.org)

131 pointsstevesalevan12y ago32 comments

32 comments

25 comments · 9 top-level

supermatt12y ago· 7 in thread

Having read through the documents, I see that it is capable of ~200 GFLOPS. Obviously, this outshines my macbook performance of ~50GFLOPS, but it is substantially less than my workstations GPU, an Nvidia GTX 770, capable of ~3000 GFLOPS.

I suppose my question is why would I choose to build something like this over using a GPU?

bri3d12y ago

Here's some previous discussion:

https://news.ycombinator.com/item?id=4702456

In terms of raw GFLOPS, you shouldn't.

Parallela offers something halfway between a multi-CPU cluster, differing in terms of memory access, and a GPU, differing in that each core is a real core and can be independently making calls, branching, etc.

Another point to note is that every GPU architecture is different and that some support a different degree of control flow parallelism vs. data parallelism.

Ultimately it depends on the kernel you're working with - if you're to a point where you've got strictly linear SIMD and you depend entirely on floating-point math throughput or memory bandwidth, it wouldn't make sense not to use a GPU instead.

moconnor12y ago

Although if you like or have code suitable for a multicore architecture you would get better performance from a Xeon Phi or the forthcoming Knight's Landing from Intel, both of which run x86_64 Linux.

Between ARM, OpenPower and Intel's Phi successors this is becoming a hyper-competitive space. Interesting times ahead!

vidarh12y ago

The Parallella is a testbed for their chips, to let people experiment with the platform. The real value is for embedded use (of the chips, not the whole current design), and as they scale up the chips - their roadmap is for chips with thousands of cores.

Compared with your GPU, there's also another important factor: These are entirely independent cores. For stuff that your GPU does well - that is, stuff where you're doing the exact same manipulation of a large number of values in parallel - it's likely to keep beating the epiphany chips. For stuff it doesn't do well, that is, stuff where you need indpendent threads of execution for each data stream, the Epiphany chips may become a better fit.

illumen12y ago

For the experience building it mainly I guess. The author also mentions machine learning and image processing. Doing multiple different tasks at once. Which I think is possible with the latest CUDA and Nvidia gear though.

It has 8 FPGAs on there too. Useful for highspeed IO and other tasks. Also, there are 8 HDMI outputs on there. Which would make this a pretty interesting video wall machine.

If it had the 64 core chips instead of the 16 core chips, this thing would be ~ 800 GFLOPS I guess.

toddan12y ago

I guess it is easier to program, I am only a simple web developer and i would be more productive programming on this than learning to use cuda.

rasz_pl12y ago

Its like gardening. No point really, other than hobby.

angersock12y ago

But when I'm gardening, I prefer to see bugs. It means everything is healthy and happy. :(

onalark12y ago· 4 in thread

This is a misleading title. Brian builds Supercomputer Models, not Supercomputers. It would be like if I had a model of a Ferrari I put together at home and wrote an article about how I built sports cars in my spare time.

Yes, these are awesome, full-featured models, but the differences between this and a supercomputer, which costs tens of millions of dollars, requires high-density power and cooling, features multi-dimensional, low-diameter networks, and contains hundreds of thousands to millions of compute cores is... quite vast.

vidarh12y ago

I get what you are saying, but what is a supercomputer today is a pedestrian little home computer tomorrow.

The cylinder design he's using is inspired by the early Cray models. Cray 1 had a performance of 80 MFLOPS. Cray X-MP had a performance of 800 MFLOPS. The Cray 2 (which looked substantially different) reached 1.9 GFLOPS in 1985.

1993-1996, Numerical Wind Tunnel - a 140 CPU vector computer, was at the top most of the time. It reached it's all time peak at around 235.8 GFLOP/s

Even ASCII Red, which held the top spot until the end of the 20th century, only reached 1.3 TFLOPS.

So unlike if you built a model of a Ferrari at home, this thing actually substantially outperforms the fastest supercomputers up until the mid 90's.

onalark12y ago

This is a little bit of a straw man. As Jack Dongarra will happily tell you, an iPad can outperform some of the supercomputers from the beginning of the Top500 benchmark. You don't get to be a supercomputer today by beating supercomputers from two decades ago, that's not how technology works. I'm taking umbrage at the linkbait title because I'm a grumpy old man, not because I don't think this project isn't cool (and admirable!)

2 more replies

poulson12y ago

Or, "Brian builds distributed-memory computers". Despite the misleading title, I think such a project would be a great tool for a parallel computing class. Such a machine would be roughly the same cost as a textbook and would be much more rewarding than running MPI on a multicore laptop/desktop.

onalark12y ago

Agreed. It's a really cool project, and maybe you should build one for your next parallel computing class :)

hunt12y ago· 4 in thread

I can see one fan at the top of the cylinder- is only one fan really sufficient? It looks like this would generate a large amount of heat.

I would be quite interested to see the temperatures this gets up to.

vidarh12y ago

A single 16 core Parallella can run with just passive cooling. It takes next to no cooling to bring both the ARM and Epiphany chips down to near room temperature.

BaryonBundle12y ago

While the 16 Parallella can run with just passive cooling, you still need to ensure proper airflow over the unit.

Units have been known to overheat with just passive cooling, and they even advise that you install a fan with the official case (that they sell on their store/provided to backers), even though there is nowhere to screw a fan in, etc.

Cases and Cooling: http://www.parallella.org/2014/04/30/cases-and-cooling/ (parallella.org) April 30, 2014

The point that a cluster of parallellas is relatively easy to cool is definitely aided by the form factor.

JTon12y ago

It's a pretty big fan. Also, the cylindrical design certainly helps airflow

hunt12y ago

The design of it certainly helps- I imagine that the switch will get warm on its own. Attaching boards to that must surely make a lot of heat.

2 more replies

deutronium12y ago· 1 in thread

Wouldn't it be quite hard to benchmark something like Parallela as it contains an FPGA/ARM/Their own multi-core chip.

rjsw12y ago

Not really, you have to make it explicit where any program would run.

coreymgilmore12y ago

For anyone running into the DB connection error and the page not loading:

http://webcache.googleusercontent.com/search?q=cache:http://...

contingencies12y ago

Isn't the architecture of real world supercomputers essentially dependent on their expected load?

The very notion of a 'general purpose processing supercomputer' essentially conjures one of those modern data center visions: a large array of identical consumer-grade hardware, with its high price/performance ratio: accessioned, wired, tested, commissioned, allocated workloads, managed over time and finally decomissioned by a combination of carefully developed human procedures and highly automated processes?

For instance, nobody in their right mind would install an OS on every such node by hand: it has to be PXE or similar (can you boot root-on-iSCSI direct from BIOS these days?).

I'm curious how many fellow HN'ers out there are doing this.

tomberek12y ago

http://webcache.googleusercontent.com/search?q=cache:nBNJrRz...

lemcoe912y ago

Your LEDs use 20W of power? I would immediately scrap those - they serve no real purpose and use an insane amount of power, compared to their computing counterparts.

manuw12y ago

"Error establishing a database connection" *scnr

j / k navigate · click thread line to collapse

32 comments

25 comments · 9 top-level

supermatt12y ago· 7 in thread

I suppose my question is why would I choose to build something like this over using a GPU?

bri3d12y ago

Here's some previous discussion:

https://news.ycombinator.com/item?id=4702456

In terms of raw GFLOPS, you shouldn't.

Another point to note is that every GPU architecture is different and that some support a different degree of control flow parallelism vs. data parallelism.

moconnor12y ago

Between ARM, OpenPower and Intel's Phi successors this is becoming a hyper-competitive space. Interesting times ahead!

vidarh12y ago

illumen12y ago

It has 8 FPGAs on there too. Useful for highspeed IO and other tasks. Also, there are 8 HDMI outputs on there. Which would make this a pretty interesting video wall machine.

If it had the 64 core chips instead of the 16 core chips, this thing would be ~ 800 GFLOPS I guess.

toddan12y ago

I guess it is easier to program, I am only a simple web developer and i would be more productive programming on this than learning to use cuda.

rasz_pl12y ago

Its like gardening. No point really, other than hobby.

angersock12y ago

But when I'm gardening, I prefer to see bugs. It means everything is healthy and happy. :(

onalark12y ago· 4 in thread

vidarh12y ago

I get what you are saying, but what is a supercomputer today is a pedestrian little home computer tomorrow.

1993-1996, Numerical Wind Tunnel - a 140 CPU vector computer, was at the top most of the time. It reached it's all time peak at around 235.8 GFLOP/s

Even ASCII Red, which held the top spot until the end of the 20th century, only reached 1.3 TFLOPS.

So unlike if you built a model of a Ferrari at home, this thing actually substantially outperforms the fastest supercomputers up until the mid 90's.

onalark12y ago

2 more replies

poulson12y ago

onalark12y ago

Agreed. It's a really cool project, and maybe you should build one for your next parallel computing class :)

hunt12y ago· 4 in thread

I can see one fan at the top of the cylinder- is only one fan really sufficient? It looks like this would generate a large amount of heat.

I would be quite interested to see the temperatures this gets up to.

vidarh12y ago

A single 16 core Parallella can run with just passive cooling. It takes next to no cooling to bring both the ARM and Epiphany chips down to near room temperature.

BaryonBundle12y ago

While the 16 Parallella can run with just passive cooling, you still need to ensure proper airflow over the unit.

Cases and Cooling: http://www.parallella.org/2014/04/30/cases-and-cooling/ (parallella.org) April 30, 2014

The point that a cluster of parallellas is relatively easy to cool is definitely aided by the form factor.

JTon12y ago

It's a pretty big fan. Also, the cylindrical design certainly helps airflow

hunt12y ago

The design of it certainly helps- I imagine that the switch will get warm on its own. Attaching boards to that must surely make a lot of heat.

2 more replies

deutronium12y ago· 1 in thread

Wouldn't it be quite hard to benchmark something like Parallela as it contains an FPGA/ARM/Their own multi-core chip.

rjsw12y ago

Not really, you have to make it explicit where any program would run.

coreymgilmore12y ago

For anyone running into the DB connection error and the page not loading:

http://webcache.googleusercontent.com/search?q=cache:http://...

contingencies12y ago

Isn't the architecture of real world supercomputers essentially dependent on their expected load?

For instance, nobody in their right mind would install an OS on every such node by hand: it has to be PXE or similar (can you boot root-on-iSCSI direct from BIOS these days?).

I'm curious how many fellow HN'ers out there are doing this.

tomberek12y ago

http://webcache.googleusercontent.com/search?q=cache:nBNJrRz...

lemcoe912y ago

Your LEDs use 20W of power? I would immediately scrap those - they serve no real purpose and use an insane amount of power, compared to their computing counterparts.

manuw12y ago

"Error establishing a database connection" *scnr

j / k navigate · click thread line to collapse