They're Tesla priced without an equivalent desktop gamer graphics card, and that means you can't just dip your toe into the water; you've got to buy the canoe up front.
Programming on a normal x86 doesn't really count, because there's no way to get a feel for what is fast and slow when you're using a monster of a core capable of running your poor code more quickly than it deserves.
There is one wonderful opportunity, though, that deserves to be better known. Intel has sponsored Colfax Research to offer free online introductory courses, which include two weeks of remote access. The next session begins August 29th: http://colfaxresearch.com/how-16-08/
(I'm unaffiliated, but enjoyed the course a few months ago.)
Did you happen to use the Knight's Corner or the new Knight's Landing variant? I'd be quite interested to know how KL stacks up, as naively from the specs it seems like it should be a lot more tolerant with code (but not poor memory access patterns).
This reverses the last 20 years where intel made inroads into the datacenter and there were few fundamental differences between xeons and their desktop brethren (the i5/i7 etc). Intel will have vastly different ISAs on server and client this coming generation (desktop is not getting AVX512). I suspect the storage layer to get bifurcated as well, since its unclear if clients will see much benefit from things like xpoint. In short client side the only tangible gains that seem to benefit off late are - will the hardware change improve battery life, will it enable thinner form factors and will it make a browser run measurably faster. I watch with great interest how Intel will push adoption of hardware features going forward on the client.
PS. For pricing, take into account that the new generation Xeon Phi's are bootable, you do not need a host CPU babysit like Tesla's case.
It's 12 core so performance in all-core situation would be about the same as this one. But on non-parallelized code it would be ~5x faster..
http://www.thingiverse.com/thing:997213
http://ssrb.github.io/hpc/2015/04/17/cooling-down-the-xeon-p...
As you mention, you'd still need a motherboard with 64-bit Base Address Register support, but at least you could keep it from burning up (or more likely, shutting down when it overheats).
Man the human brain is such a narcissist
> However, ‘smaller’ is a relative term as current visualizations can occur on a machine that contains less than a terabyte of RAM. Traditional raster-based rendering would have greatly increased the memory consumption as the convoluted shape of each neuron would require a mesh containing approximately 100,000 triangles per neuron.
That sounds like a poor approach to this problem. You could write a shader that renders thick lines for the dendrites, and the rest of the geometry can be conventional meshes. The same shader could have a pass specially designed for lines and depth of field rendering. That's the one unusual shader. It's hard, but not super hard to write. [0]
Besides, unless you need this to run in real time (which the Xeon Phi doesn't anyway), you could just raster render and page in the mesh data from wherever. So what if it's slow.
I think highly technical platform decisions like Xeon Phi versus NVIDIA CUDA is really about the details. You have to educate the reader both on the differences that matter and why they should choose one over the other. The comment in the article, "no GPU dependencies," is a very PR-esque don't-mention-your-competitor dance around what they're actually trying to say: the CUDA ecosystem can be a pain since you can't buy the MacBook Pro with the GTX 750M easily, installing all its drivers is error-prone, SIP gets in the way of everything, Xcode and CUDA updates tend to break each other, etc. etc.
I sound like I know what I'm talking about, right? Intel's just not getting it. Show a detailed application of where Xeon Phi really excels. NVIDIA's accelerated science examples go back a decade, and some, like the accelerated grid solved Navier-Stokes fluids examples, are still state of the art.
The competition in rendering is intense. Some level of production-ready renderers like Arion, Octane and mental ray (specifically iRay, NVIDIA's GPU accelerated renderer) perform best or are exclusive to the CUDA platform. Conversely, you probably get the most flexibility from a platform like VRay or Renderman, whose support for GPU acceleration is limited. Intel embtree has a great presence today in baked lighting for game engines, but I think NVIDIA's OptiX is a lot faster.
You will be surprised how bad medical research and visualization is compared to their gaming counterparts. Most medical researchers use 5-10 year old technological approaches they learned in their PhD program.
On a side note, I have yet to see a Phi-vs-CUDA comparison. Intel is comparing Phi to Pentiums, which is utterly ridiculous.
They hold their own against GPGPU, but are probably the inferior choice if your code already runs on a GPU (OpenCL/CUDA).
The real advantage of the Phi is of course combining this nearly-as-good-as-GPGPU parallelism with the x86_64 toolchain and infrastructure. x86 supports more languages with more libraries, and is easier to develop for.
This is not true for anything beyond running compute shaders on large 1D, 2D, or 3D buffers. Just because something is 'graphics' doesn't mean that a GPU is automatically faster.
> Production-ready renderers like Arnold, Octane and mental ray (NVIDIA's renderer) perform best or are exclusive to the CUDA platform.
Arnold is a CPU renderer, Octane is FAR from what I would consider 'production ready' and mental ray is also a software renderer. Renderman does not use any GPU acceleration.
> I sound like I know what I'm talking about, right?
Not even slightly
> This is not true for anything beyond running compute shaders on large 1D, 2D, or 3D buffers.
Yes, but rendering is a shader on a bunch of those buffers right? That's what I wrote. I'm not 100% confident that you can efficiently render with conventional shaders what they showed in that frame. But I think you can. You could at least cull and tesselate tubes on the GPU, if you really don't want to write a shader.
1. With the mess of overlapping lines they've got, you'd suffer from severe overdraw (which is where raytracing really shines in terms of efficiency) as you can't efficiently cull lines (without clipping them)
2. You wouldn't get the ambient occlusion look where lines close to each other occlude / darken.
As someone who's previously compared Embree and OptiX (and we were given free hardware and support from Nvidia), Embree stacks up really well, and a dual Xeon can match a single top-of-the-line GPU fairly easily for pure ray-intersection performance.
Once you start putting complex shaders and layered materials on top, GPUs start to really suffer: there's a reason a lot of the GPU renders are mostly being used for clean renders like archvis / product design / car renders - they're simple to render. As soon as you stick dirt layers on top, their efficiency really starts to plummet.
I guess it really depends on what the objective is. I'm not talking speculatively, but concretely it seems like a reasonable way to achieve a few images that they show in the press release. They show two relatively flatly rendered lots-of-tubes images. I know SSAO isn't the same, and I get that there's overdraw, but there are a lot of details in the particular objective they want. In one shot, they show a lot of emissive tubes with depth of field, which is harder to achieve. I suppose if they're happy, they're happy.
> interactive performance for all datasets on a regular Intel Xeon processor, which can render images at 20-25 frames per second (FPS)
There's a big difference between interactive performance and a production-quality render. Something tells me it's not producing 25 frames of noise-free render per second. There isn't enough information here.
> and a dual Xeon can match a single top-of-the-line GPU
At what, like 3x-5x the price? At how many watts? And at what I.T. complexity? A GTX 1080, at better performance than a Titan X, is really a phenomenally good deal. Especially considering I can drop it into an existing workstation with all of my existing software installed on it; especially considering I can rent out computation time on Amazon by the hour.
I guess what I'm reacting to is how forced of an example it seems.
I certainly hope Phi has more advantages than the write once run anywhere / portability angle they kept pushing.
Has anyone chosen Phi for a real project that was in no way funded or subsidized by Intel?