Using Intel’s Xeon Phi for Brain Research Visualization (opens in new tab)

(top500.org)

90 pointscvursache9y ago37 comments

37 comments

26 comments · 6 top-level

reitzensteinm9y ago· 10 in thread

The lowest price Xeon Phi in this generation is $2,348 (1.3ghz, 64 cores) - I can't help but feel Intel would do well to introduce an enthusiast product in to the lineup. Even 1.0ghz, 48 cores for $1000.

They're Tesla priced without an equivalent desktop gamer graphics card, and that means you can't just dip your toe into the water; you've got to buy the canoe up front.

Programming on a normal x86 doesn't really count, because there's no way to get a feel for what is fast and slow when you're using a monster of a core capable of running your poor code more quickly than it deserves.

nkurz9y ago

I agree with you completely. One other thing that I think Intel could/should do is to cooperate with one of the major cloud providers to offer reasonably priced by-the-hour remote access.

There is one wonderful opportunity, though, that deserves to be better known. Intel has sponsored Colfax Research to offer free online introductory courses, which include two weeks of remote access. The next session begins August 29th: http://colfaxresearch.com/how-16-08/

(I'm unaffiliated, but enjoyed the course a few months ago.)

dibanez9y ago

As someone who has programmed both Phis and conventional x86 CPUs, I can confirm that the Phi is more sensitive to data traversal order and NUMA effects on which core accesses which memory. Also, the latest generation (Knights Landing) has much better performing cores than the previous generation.

reitzensteinm9y ago

Well, if it weren't the case and you had that core count without compromise, Phis would have come along a lot sooner with a price tag to match :)

Did you happen to use the Knight's Corner or the new Knight's Landing variant? I'd be quite interested to know how KL stacks up, as naively from the specs it seems like it should be a lot more tolerant with code (but not poor memory access patterns).

1 more reply

dman9y ago

I agree - but I think there appears to be a more significant shift underpinning this. I suspect that we are beginning to see an architectural divergence between server and client.

This reverses the last 20 years where intel made inroads into the datacenter and there were few fundamental differences between xeons and their desktop brethren (the i5/i7 etc). Intel will have vastly different ISAs on server and client this coming generation (desktop is not getting AVX512). I suspect the storage layer to get bifurcated as well, since its unclear if clients will see much benefit from things like xpoint. In short client side the only tangible gains that seem to benefit off late are - will the hardware change improve battery life, will it enable thinner form factors and will it make a browser run measurably faster. I watch with great interest how Intel will push adoption of hardware features going forward on the client.

Klinky9y ago

It's is rather silly for Phi to be positioned as "it's just like x86, oh wait except for needing to use special SIMD instructions to get max performance". Kind of like Atom being x86 for ultra mobile platforms, just not being able to match the power/performance of ARM. Once you start sacrificing things to maintain x86 compatibility, you really loose its benefits.

1 more reply

yvdriess9y ago

Programming a modern Xeon x86 does count. Modernising your software for a haswell/skylake server Xeon (ISA was made public) also modernises it for the new Xeon Phi. You have a nearly identical ISA and programming model. In other words, modernising your code to scale well on a 16C/2P Xeon system is essentially dipping your toe for a full blown KNL Xeon Phi.

PS. For pricing, take into account that the new generation Xeon Phi's are bootable, you do not need a host CPU babysit like Tesla's case.

lqdc139y ago

Why not just use Xeon 2697-v2 for the same price as the phi?

It's 12 core so performance in all-core situation would be about the same as this one. But on non-parallelized code it would be ~5x faster..

profquail9y ago

Memory bandwidth is important too. The Knights Landing processors have a 16GB on-chip memory to the cores have significantly higher bandwidth than you'd get with DDR4; the additional memory bandwidth makes more of an impact on the runtime of some algorithms than raw compute performance does.

1 more reply

segmondy9y ago

Last year they sold a bunch of them 60 cores for $200. I got one, the problem is that they run hot and need a server that can support them. I'm yet to acquire a server with bar support, so it's still sitting. :-( Anyways, they are out there for decent price, keep your eyes open and you will find a deal.

nkurz9y ago

More than just "run hot", they were the "passive" models that require external cooling. You might be interested in these 3D printed designs for the cooling:

http://www.thingiverse.com/thing:997213

http://ssrb.github.io/hpc/2015/04/17/cooling-down-the-xeon-p...

As you mention, you'd still need a motherboard with 64-bit Base Address Register support, but at least you could keep it from burning up (or more likely, shutting down when it overheats).

1 more reply

doctorpangloss9y ago· 7 in thread

I don't know if using Xeon Phi for rendering makes that much sense. It's sort of the problem it's least competitive to solve on a raw performance, performance per watt or development cost basis.

> However, ‘smaller’ is a relative term as current visualizations can occur on a machine that contains less than a terabyte of RAM. Traditional raster-based rendering would have greatly increased the memory consumption as the convoluted shape of each neuron would require a mesh containing approximately 100,000 triangles per neuron.

That sounds like a poor approach to this problem. You could write a shader that renders thick lines for the dendrites, and the rest of the geometry can be conventional meshes. The same shader could have a pass specially designed for lines and depth of field rendering. That's the one unusual shader. It's hard, but not super hard to write. [0]

Besides, unless you need this to run in real time (which the Xeon Phi doesn't anyway), you could just raster render and page in the mesh data from wherever. So what if it's slow.

I think highly technical platform decisions like Xeon Phi versus NVIDIA CUDA is really about the details. You have to educate the reader both on the differences that matter and why they should choose one over the other. The comment in the article, "no GPU dependencies," is a very PR-esque don't-mention-your-competitor dance around what they're actually trying to say: the CUDA ecosystem can be a pain since you can't buy the MacBook Pro with the GTX 750M easily, installing all its drivers is error-prone, SIP gets in the way of everything, Xcode and CUDA updates tend to break each other, etc. etc.

I sound like I know what I'm talking about, right? Intel's just not getting it. Show a detailed application of where Xeon Phi really excels. NVIDIA's accelerated science examples go back a decade, and some, like the accelerated grid solved Navier-Stokes fluids examples, are still state of the art.

The competition in rendering is intense. Some level of production-ready renderers like Arion, Octane and mental ray (specifically iRay, NVIDIA's GPU accelerated renderer) perform best or are exclusive to the CUDA platform. Conversely, you probably get the most flexibility from a platform like VRay or Renderman, whose support for GPU acceleration is limited. Intel embtree has a great presence today in baked lighting for game engines, but I think NVIDIA's OptiX is a lot faster.

[0] https://mattdesl.svbtle.com/drawing-lines-is-hard

ActsJuvenile9y ago

> That sounds like a poor approach to this problem. You could write a shader that renders thick lines for the dendrites, and the rest of the geometry can be conventional meshes. The same shader could have a pass specially designed for lines and depth of field rendering. That's the one unusual shader. It's hard, but not super hard to write. [0]

You will be surprised how bad medical research and visualization is compared to their gaming counterparts. Most medical researchers use 5-10 year old technological approaches they learned in their PhD program.

On a side note, I have yet to see a Phi-vs-CUDA comparison. Intel is comparing Phi to Pentiums, which is utterly ridiculous.

milcron9y ago

Here is a comparison of the previous generation: https://www.xcelerit.com/computing-benchmarks/libor/intel-xe...

They hold their own against GPGPU, but are probably the inferior choice if your code already runs on a GPU (OpenCL/CUDA).

The real advantage of the Phi is of course combining this nearly-as-good-as-GPGPU parallelism with the x86_64 toolchain and infrastructure. x86 supports more languages with more libraries, and is easier to develop for.

berkut9y ago

That's not quite fair - some of the research into volumetric medium interaction and scattering is way ahead of the VFX / Gaming fields...

CyberDildonics9y ago

> It's sort of the problem it's least competitive to solve on a raw performance, performance per watt or development cost basis.

This is not true for anything beyond running compute shaders on large 1D, 2D, or 3D buffers. Just because something is 'graphics' doesn't mean that a GPU is automatically faster.

> Production-ready renderers like Arnold, Octane and mental ray (NVIDIA's renderer) perform best or are exclusive to the CUDA platform.

Arnold is a CPU renderer, Octane is FAR from what I would consider 'production ready' and mental ray is also a software renderer. Renderman does not use any GPU acceleration.

> I sound like I know what I'm talking about, right?

Not even slightly

doctorpangloss9y ago

My bad, I wrote Arnold instead of Arion, I mix them up when writing it out. iRay is sort of a feature of mental ray, I guess if you're being pedantic. Octane isn't production ready, but I suppose if you're used to building render farms it's not production ready. It's certainly production ready for someone paying for all those licenses.

> This is not true for anything beyond running compute shaders on large 1D, 2D, or 3D buffers.

Yes, but rendering is a shader on a bunch of those buffers right? That's what I wrote. I'm not 100% confident that you can efficiently render with conventional shaders what they showed in that frame. But I think you can. You could at least cull and tesselate tubes on the GPU, if you really don't want to write a shader.

1 more reply

berkut9y ago

Drawing lines is a stupid way of doing it, as:

1. With the mess of overlapping lines they've got, you'd suffer from severe overdraw (which is where raytracing really shines in terms of efficiency) as you can't efficiently cull lines (without clipping them)

2. You wouldn't get the ambient occlusion look where lines close to each other occlude / darken.

As someone who's previously compared Embree and OptiX (and we were given free hardware and support from Nvidia), Embree stacks up really well, and a dual Xeon can match a single top-of-the-line GPU fairly easily for pure ray-intersection performance.

Once you start putting complex shaders and layered materials on top, GPUs start to really suffer: there's a reason a lot of the GPU renders are mostly being used for clean renders like archvis / product design / car renders - they're simple to render. As soon as you stick dirt layers on top, their efficiency really starts to plummet.

doctorpangloss9y ago

> stupid way of doing it

I guess it really depends on what the objective is. I'm not talking speculatively, but concretely it seems like a reasonable way to achieve a few images that they show in the press release. They show two relatively flatly rendered lots-of-tubes images. I know SSAO isn't the same, and I get that there's overdraw, but there are a lot of details in the particular objective they want. In one shot, they show a lot of emissive tubes with depth of field, which is harder to achieve. I suppose if they're happy, they're happy.

> interactive performance for all datasets on a regular Intel Xeon processor, which can render images at 20-25 frames per second (FPS)

There's a big difference between interactive performance and a production-quality render. Something tells me it's not producing 25 frames of noise-free render per second. There isn't enough information here.

> and a dual Xeon can match a single top-of-the-line GPU

At what, like 3x-5x the price? At how many watts? And at what I.T. complexity? A GTX 1080, at better performance than a Titan X, is really a phenomenally good deal. Especially considering I can drop it into an existing workstation with all of my existing software installed on it; especially considering I can rent out computation time on Amazon by the hour.

I guess what I'm reacting to is how forced of an example it seems.

1 more reply

yolesaber9y ago· 3 in thread

"Figure 1: Even first in-silico models show the complexity and beauty of the brain"

Man the human brain is such a narcissist

colechristensen9y ago

I'm imagining a new test for artificial intelligence measuring a system's capability for narcissism – the true metric of real consciousness.

shermanyo9y ago

Favourite comment in ages, thank you!

trhway9y ago

especially given that it is a poorly designed, if any, patchwork of new features developed using "spaghetti on the wall" approach and piled on top of the old ones.

dibanez9y ago

One of the interesting things to keep in mind is that these new Xeon Phi cards can be used as standalone CPUs, not just as PCIe cards like a GPU. This is the "self-hosted mode" the article talks about. So one can now think about comparing a lone Xeon Phi doing both jobs versus a CPU plus an NVidia GPU.

WhitneyLand9y ago

This article is too fluffy, sounds like it had help from Intel's PR depr.

I certainly hope Phi has more advantages than the write once run anywhere / portability angle they kept pushing.

Has anyone chosen Phi for a real project that was in no way funded or subsidized by Intel?

drwdal9y ago

I'm excited for Xenon Phi even with the expense of it, Intel needs to realize that even though they dominate in x86 they need to price competitively.

j / k navigate · click thread line to collapse

37 comments

26 comments · 6 top-level

reitzensteinm9y ago· 10 in thread

They're Tesla priced without an equivalent desktop gamer graphics card, and that means you can't just dip your toe into the water; you've got to buy the canoe up front.

nkurz9y ago

I agree with you completely. One other thing that I think Intel could/should do is to cooperate with one of the major cloud providers to offer reasonably priced by-the-hour remote access.

(I'm unaffiliated, but enjoyed the course a few months ago.)

dibanez9y ago

reitzensteinm9y ago

Well, if it weren't the case and you had that core count without compromise, Phis would have come along a lot sooner with a price tag to match :)

1 more reply

dman9y ago

I agree - but I think there appears to be a more significant shift underpinning this. I suspect that we are beginning to see an architectural divergence between server and client.

Klinky9y ago

1 more reply

yvdriess9y ago

PS. For pricing, take into account that the new generation Xeon Phi's are bootable, you do not need a host CPU babysit like Tesla's case.

lqdc139y ago

Why not just use Xeon 2697-v2 for the same price as the phi?

It's 12 core so performance in all-core situation would be about the same as this one. But on non-parallelized code it would be ~5x faster..

profquail9y ago

1 more reply

segmondy9y ago

nkurz9y ago

More than just "run hot", they were the "passive" models that require external cooling. You might be interested in these 3D printed designs for the cooling:

http://www.thingiverse.com/thing:997213

http://ssrb.github.io/hpc/2015/04/17/cooling-down-the-xeon-p...

As you mention, you'd still need a motherboard with 64-bit Base Address Register support, but at least you could keep it from burning up (or more likely, shutting down when it overheats).

1 more reply

doctorpangloss9y ago· 7 in thread

I don't know if using Xeon Phi for rendering makes that much sense. It's sort of the problem it's least competitive to solve on a raw performance, performance per watt or development cost basis.

Besides, unless you need this to run in real time (which the Xeon Phi doesn't anyway), you could just raster render and page in the mesh data from wherever. So what if it's slow.

[0] https://mattdesl.svbtle.com/drawing-lines-is-hard

ActsJuvenile9y ago

On a side note, I have yet to see a Phi-vs-CUDA comparison. Intel is comparing Phi to Pentiums, which is utterly ridiculous.

milcron9y ago

Here is a comparison of the previous generation: https://www.xcelerit.com/computing-benchmarks/libor/intel-xe...

They hold their own against GPGPU, but are probably the inferior choice if your code already runs on a GPU (OpenCL/CUDA).

berkut9y ago

That's not quite fair - some of the research into volumetric medium interaction and scattering is way ahead of the VFX / Gaming fields...

CyberDildonics9y ago

> It's sort of the problem it's least competitive to solve on a raw performance, performance per watt or development cost basis.

This is not true for anything beyond running compute shaders on large 1D, 2D, or 3D buffers. Just because something is 'graphics' doesn't mean that a GPU is automatically faster.

> Production-ready renderers like Arnold, Octane and mental ray (NVIDIA's renderer) perform best or are exclusive to the CUDA platform.

Arnold is a CPU renderer, Octane is FAR from what I would consider 'production ready' and mental ray is also a software renderer. Renderman does not use any GPU acceleration.

> I sound like I know what I'm talking about, right?

Not even slightly

doctorpangloss9y ago

> This is not true for anything beyond running compute shaders on large 1D, 2D, or 3D buffers.

1 more reply

berkut9y ago

Drawing lines is a stupid way of doing it, as:

2. You wouldn't get the ambient occlusion look where lines close to each other occlude / darken.

doctorpangloss9y ago

> stupid way of doing it

> interactive performance for all datasets on a regular Intel Xeon processor, which can render images at 20-25 frames per second (FPS)

> and a dual Xeon can match a single top-of-the-line GPU

I guess what I'm reacting to is how forced of an example it seems.

1 more reply

yolesaber9y ago· 3 in thread

"Figure 1: Even first in-silico models show the complexity and beauty of the brain"

Man the human brain is such a narcissist

colechristensen9y ago

I'm imagining a new test for artificial intelligence measuring a system's capability for narcissism – the true metric of real consciousness.

shermanyo9y ago

Favourite comment in ages, thank you!

trhway9y ago

especially given that it is a poorly designed, if any, patchwork of new features developed using "spaghetti on the wall" approach and piled on top of the old ones.

dibanez9y ago

WhitneyLand9y ago

This article is too fluffy, sounds like it had help from Intel's PR depr.

I certainly hope Phi has more advantages than the write once run anywhere / portability angle they kept pushing.

Has anyone chosen Phi for a real project that was in no way funded or subsidized by Intel?

drwdal9y ago

I'm excited for Xenon Phi even with the expense of it, Intel needs to realize that even though they dominate in x86 they need to price competitively.

j / k navigate · click thread line to collapse