Writing a Portable CPU/GPU Ray Tracer in C# (opens in new tab)

(mellinoe.github.io)

116 pointsmellinoe8y ago22 comments

22 comments

21 comments · 6 top-level

ionafgnio8y ago· 6 in thread

Is there a legitimate reason for D3D12 to exist? It seems like just a locked-down version of Vulkan to me intended to kill cross-platform gaming (textbook Microsoft Embrace, Extend, Extinguish). I haven't heard of any advantages in performance or ease of development. But I'm certainly no expert.

Unrelated note: there's something really funky going with the comparison between the desktop and the laptop. The desktop is pretty high-end and should just obliterate that Macbook. It's seriously weird that they're so close. I'm not sure I trust the results.

zamalek8y ago

> ease of development

Apart from the fact that DirectX 12 predates Vulkan by a substantial margin (and includes APIs for more than graphics), this is a pretty major reason. The API itself is only one concern and I can't tell whether it's easier or harder to use OpenGL (assuming an expert at both) - it's just different. The tooling surrounding it is a completely different story.

Even during DirectX 9 there was a tool that would show you the state of all your backbuffers after draw calls[1]. It was extremely spartan, but with such a simple tool debugging shaders became enormously easier. I'm not sure if anything exists for OpenGL today. They productized this and more in VS2017[2] - including being able to step through shaders in a debugger. The big draw to the Windows platform for developers has always been the tooling and Microsoft knows this.

VS and DirectX aren't going away until people stop regarding them a "just a text editor" and "just an API" respectively. It would help if people earnestly competed with what Microsoft is doing, instead of dismissing their products for no other reason than dogma.

> Microsoft Embrace, Extend, Extinguish

That ended with the exit of Balmer.

[1]: https://tomtech999.wordpress.com/2011/09/07/debugging-direct...

[2]: https://msdn.microsoft.com/en-us/library/hh315751.aspx

Const-me8y ago

> I'm not sure if anything exists for OpenGL today.

A few years ago Crytek open sourced this: https://renderdoc.org/

I’ve mostly used the tool with DX11, but they also support OpenGL, GLES and Vulkan.

mattnewport8y ago

There are several good reasons. D3D12 was finalized some time before Vulkan and was available in early / preview versions for quite a while before that. Stable drivers were available much sooner than for Vulkan. It was an easier transition for code bases that already had D3D11 support and Microsoft offered additional code to help with the transition (the D3D11 on 12 stuff). You could bring your existing HLSL shaders over easily. There were better debugging and profiling tools available sooner. The Xbox One got some additional extensions to better utilize its hardware.

pjmlp8y ago

OpenGL always lagged behind proprietary 3D APIs.

Microsoft is not alone, people keep forgetting OpenGL never mattered on game consoles.

Even Vulkan is supported on the Swift, the main 3D API is NVN.

Some of the benefits of DirectX, proper 3D API, with a toolkit that supports math, fonts, textures, sprites instead of playing "go fish" libraries, OOP API, OOP shading language, nice debugging integrated into visual studio.

Same applies to the other 3D APIs.

Game developers care about their IP, how it gets into each platform is secondary to that.

Const-me8y ago

> there's something really funky going with the comparison between the desktop and the laptop.

The OP has relatively old GPU in the desktop, GeForce GTX 770 is from 2013, it delivers up to 3.2 TFlops single precision.

The laptop is much newer, its’ from 2016, and the Intel GPU delivers 0.7-0.8 TFlops. The raw computational power is different, but not 10 times different, just 4 times.

The reason why OP has measured 2 times difference not 4 times is probably because data access costs. Intel GPU is integrated so it has very fast access to system RAM. nVidia GPU in the desktop needs to copy data from system RAM to VRAM over the PCIx. PCIx is slower, both bandwidth (CPU to RAM: 34.1 GB/s, PCIx 3.0: 985 MB/s per lane i.e. 15.76 GB/s for 16x), and especially latency.

mellinoeOP8y ago

> The desktop is pretty high-end and should just obliterate that Macbook

I was also a little surprised at how well the MacBook performed, but it's quite reproducible for me. To be fair, this desktop is actually about 5 years old, and it wasn't really "top of the line" when I built it.

dahart8y ago· 5 in thread

Great write-up! Your speedups look like they line up as expected. Would you happen to know why Aras's look a bit weird, is it the devices he's using? I'm referring to his Metal timings being more than 10x slower than D3D, and being almost comparable to C++.

> One of my goals was “use as much of the same code as possible on the CPU and GPU”.

Great idea, it's hard to debug GPU code, and super convenient to have something that you can run & debug on the CPU, then just flip a switch and get GPU speedups!

> Another obvious limitation is that GPU code cannot recurse.

Yep. I bet this changes relatively soon, since you can sometimes use recursion in OpenCL & CUDA.

It's a fun exercise in ShaderToy to write a recursive ray tracer. Here's mine: https://www.shadertoy.com/view/XllBRf

And here's a much better one: https://www.shadertoy.com/view/4scfz4

mellinoeOP8y ago

> Great write-up! Your speedups look like they line up as expected. Would you happen to know why Aras's look a bit weird, is it the devices he's using? I'm referring to his Metal timings being more than 10x slower than D3D, and being almost comparable to C++.

Thanks for reading! Since Aras is using a GTX 1080 TI on Windows, and an Intel Iris Pro on Mac, I think the numbers make sense. In my case, my GTX 770 is pretty old, and my 2017 MacBook is pretty new, so the numbers line up better.

> It's a fun exercise in ShaderToy to write a recursive ray tracer. Here's mine: https://www.shadertoy.com/view/XllBRf

> And here's a much better one: https://www.shadertoy.com/view/4scfz4

Nice! I should add accumulation to mine -- it'd look much better for barely any effort.

aras_p8y ago

> I'm referring to his Metal timings being more than 10x slower than D3D

In my case it's the difference in hardware. DX11 results are on GTX 1080 Ti, whereas Mac are on Iris Pro on 2013 MacBookPro. > 10x performance difference between these GPUs is entirely expected.

dahart8y ago

Totally makes sense; thanks for the explanation!

ohf8y ago

> Yep. I bet this changes relatively soon, since you can sometimes use recursion in OpenCL & CUDA.

Am I correct in assuming you can only recurse when the recursion can be "unrolled" into a simple loop? In other words, no program flow structures.

dahart8y ago

You can do real recursion sometimes, and real program flow. It does depend on CUDA version & chipset, but most modern ones support flow control. (https://stackoverflow.com/questions/3644809/does-cuda-suppor...)

I think there are still cases where it won’t work right due to other libraries that might be involved, I’ve had recursion fail even on a new GPU with cuda 9.

Even if you can use recursion, it’s not usually a good idea since you’ll run into thread divergence problems.

Const-me8y ago· 4 in thread

> You give it regular C# structures and methods, and it automatically converts them to HLSL, GLSL SPIR-V, and Metal shaders.

Code introspection is one of the huge strengths of .NET ecosystem.

You can even do it multiple levels of abstraction: bytecode (using e.g. Mono.Cecil), with built-in reflection API, with built-in expressions API since .NET 3.5, and apparently this project uses even higher-level approach, Roslyn.

I have never transformed C# code into shaders, when I’m working on something similar I usually prefer traditional GPU languages like HLSL or CUDA, because tooling (e.g. visual studio can debug both).

I transformed expressions into NoSQL database queries, worked well. Here’s the main parts: https://github.com/Const-me/EsentSerialize/blob/master/Core/...

seanmcdirmid8y ago

You can also roll your own expression language using operator overloading and then plug that into the DLR or HSL, or whatever as needed (I did this in https://archive.codeplex.com/?p=bling). It doesn’t allow for statements, however, which introspection gives you.

It is much harder in the new UWP, however, where dynamic features (reflection, introspection, native JIT via the DLR) are no longer guaranteed to be present.

zamalek8y ago

Don't forget that Linq operates over duck-typed extension methods, so you can also define your own wizardry there.

seanmcdirmid8y ago

Extension methods are important to doing an EDSL in C#. Link syntax is mostly a dud, however.

1 more reply

pjmlp8y ago

While it is great feature that I enjoy a lot, it isn't the only eco-system with it. :)

IvanK_net8y ago

I made a real-time ray-traced game a few years ago, that works in a browser :) You can increase the resolution, while FPS drops. http://powerstones.ivank.net

wolfspider8y ago

Always love your work! Going to check this one out right away. Yeah GLES and readpixel aren’t friends like for instance GPUImage uses the hardware backbuffer to store it as a workaround: https://stackoverflow.com/questions/12813442/cvopenglestextu...

So it’s device specific to implement the same thing I believe GPUImage uses a stub for this to apply the same shaders you have in mind.

pjmlp8y ago

Very interesting reading, we need more article about alternative approaches.

j / k navigate · click thread line to collapse

22 comments

21 comments · 6 top-level

ionafgnio8y ago· 6 in thread

zamalek8y ago

> ease of development

> Microsoft Embrace, Extend, Extinguish

That ended with the exit of Balmer.

[1]: https://tomtech999.wordpress.com/2011/09/07/debugging-direct...

[2]: https://msdn.microsoft.com/en-us/library/hh315751.aspx

Const-me8y ago

> I'm not sure if anything exists for OpenGL today.

A few years ago Crytek open sourced this: https://renderdoc.org/

I’ve mostly used the tool with DX11, but they also support OpenGL, GLES and Vulkan.

mattnewport8y ago

pjmlp8y ago

OpenGL always lagged behind proprietary 3D APIs.

Microsoft is not alone, people keep forgetting OpenGL never mattered on game consoles.

Even Vulkan is supported on the Swift, the main 3D API is NVN.

Same applies to the other 3D APIs.

Game developers care about their IP, how it gets into each platform is secondary to that.

Const-me8y ago

> there's something really funky going with the comparison between the desktop and the laptop.

The OP has relatively old GPU in the desktop, GeForce GTX 770 is from 2013, it delivers up to 3.2 TFlops single precision.

The laptop is much newer, its’ from 2016, and the Intel GPU delivers 0.7-0.8 TFlops. The raw computational power is different, but not 10 times different, just 4 times.

mellinoeOP8y ago

> The desktop is pretty high-end and should just obliterate that Macbook

dahart8y ago· 5 in thread

> One of my goals was “use as much of the same code as possible on the CPU and GPU”.

Great idea, it's hard to debug GPU code, and super convenient to have something that you can run & debug on the CPU, then just flip a switch and get GPU speedups!

> Another obvious limitation is that GPU code cannot recurse.

Yep. I bet this changes relatively soon, since you can sometimes use recursion in OpenCL & CUDA.

It's a fun exercise in ShaderToy to write a recursive ray tracer. Here's mine: https://www.shadertoy.com/view/XllBRf

And here's a much better one: https://www.shadertoy.com/view/4scfz4

mellinoeOP8y ago

> It's a fun exercise in ShaderToy to write a recursive ray tracer. Here's mine: https://www.shadertoy.com/view/XllBRf

> And here's a much better one: https://www.shadertoy.com/view/4scfz4

Nice! I should add accumulation to mine -- it'd look much better for barely any effort.

aras_p8y ago

> I'm referring to his Metal timings being more than 10x slower than D3D

In my case it's the difference in hardware. DX11 results are on GTX 1080 Ti, whereas Mac are on Iris Pro on 2013 MacBookPro. > 10x performance difference between these GPUs is entirely expected.

dahart8y ago

Totally makes sense; thanks for the explanation!

ohf8y ago

> Yep. I bet this changes relatively soon, since you can sometimes use recursion in OpenCL & CUDA.

Am I correct in assuming you can only recurse when the recursion can be "unrolled" into a simple loop? In other words, no program flow structures.

dahart8y ago

I think there are still cases where it won’t work right due to other libraries that might be involved, I’ve had recursion fail even on a new GPU with cuda 9.

Even if you can use recursion, it’s not usually a good idea since you’ll run into thread divergence problems.

Const-me8y ago· 4 in thread

> You give it regular C# structures and methods, and it automatically converts them to HLSL, GLSL SPIR-V, and Metal shaders.

Code introspection is one of the huge strengths of .NET ecosystem.

I transformed expressions into NoSQL database queries, worked well. Here’s the main parts: https://github.com/Const-me/EsentSerialize/blob/master/Core/...

seanmcdirmid8y ago

It is much harder in the new UWP, however, where dynamic features (reflection, introspection, native JIT via the DLR) are no longer guaranteed to be present.

zamalek8y ago

Don't forget that Linq operates over duck-typed extension methods, so you can also define your own wizardry there.

seanmcdirmid8y ago

Extension methods are important to doing an EDSL in C#. Link syntax is mostly a dud, however.

1 more reply

pjmlp8y ago

While it is great feature that I enjoy a lot, it isn't the only eco-system with it. :)

IvanK_net8y ago

I made a real-time ray-traced game a few years ago, that works in a browser :) You can increase the resolution, while FPS drops. http://powerstones.ivank.net

wolfspider8y ago

So it’s device specific to implement the same thing I believe GPUImage uses a stub for this to apply the same shaders you have in mind.

pjmlp8y ago

Very interesting reading, we need more article about alternative approaches.

j / k navigate · click thread line to collapse