undefined | Better HN

0 pointscorysama11y ago0 comments

OpenGL and and D3D are both very good APIs that have worked well for 20+ years. However, they have a few ideas about the hardware fundamentally baked into them that are not aging well.

The main issue is that they are based on a model of continuously modifying a very large, monolithic body of state representing fine details about what the next draw should do. At any moment a draw call may be issued to enact the current state and produce a result.

In the past, that state was represented in hardware mostly using a large collection of physical registers. Nothing else could possibly be fast enough. The API model of "set BlendStateSourceOp, set BlendStateDestOp, ect..." mapped very well to the hardware. You literally were continuously mutating a large block of registers.

In the present, programmable hardware has become capable of largely taking over for fixed-function hardware. Modern GPUs have been increasingly cutting out special-purpose silicon to make room for more multi-purpose ALUs. These general-purpose ALUs represent how to draw using fairly large, allocated structures instead of single-purpose registers. These structures are not trivial to construct and modifying them continuously is not advised. However, switching between them is as trivial as moving a pointer from one to the other.

Fortunately, most games don't actually use a continuum of states when drawing. In practice, they switch repeatedly between a small number of states with very little variation between frames. Therefore, modern drivers do a lot of work to implicitly infer what state setups are heavily repeated within each run each application. These states are baked into structures under the hood on the fly. Odd variants are expensive in this mode. But, they are also rare, so they are lower priority.

Mantle, Metal and DX12 all seek to reboot the idea of graphics APIs from scratch based on how hardware actually works today. You set up a an explicit set of draw state structures at init time. You switch between them explicitly and trivially at run time.

A second issue baked into OGL/D3D is that, in the past, the monolithic draw state was stratified into quite nicely orthogonal chunks dealing with separate issues such as: how to load a vertex from memory vs. how to operate on a vertex vs. how to pass data from the vertex shader to the fragment shader vs. how to operate on a fragment (sample) vs. how to blend the fragment into the framebuffer. This model made the APIs quite nice to learn and to use.

Unfortunately, it is simply not representative of how the hardware actually operates today. Today, most of those operations are actually handled by general purpose ALUs. These ALUs are running the vertex and fragment programs you wrote. But, they are also running more code to handle what used to be done in fixed-function silicon. Actually, it's worse than that. What used to be a register flip that was completely orthogonal to your vertex/fragment programs is now actually implemented by modifying code interleaved into the guts of the programs you compiled back at init time. These changes are done under the hood and on the fly.

Modifying the code under the hood is expensive. Worse, the draw state is so large and complicated that it is easy to accidentally request an invalid state. Validating each given state is expensive. Because the classic model lets you make draw state changes at any time preceding a draw and the state changes are no longer stratified, the state validation can no longer be done incrementally. Instead, every time you draw a significant amount of work is done just to make sure the request makes sense.

Again, by declaring draw states up front. Compilation and validation can be done once up front. Switching between pre-compiled, pre-validated states is trivial.

A third issue is that OGL/D3D have the genuinely great goal of preventing and/or detecting synchronization errors in the usage of the API. In other words, you really shouldn't try to have the CPU modify a given block of memory while the GPU is simultaneously reading that same memory in an uncoordinated fashion. OGl and D3D have an interface and implementation designed to prevent/detect/allow-at-a-huge-cost these usage errors as much as possible. In practice, serious programs cannot ship with these errors. That means that in practice, all serious, shipping programs do not have these errors to any significant degree, but the driver is still always doing a large amount of work checking for them all of the time.

The new-style APIs seem more inclined to declare this category of usage errors to be undefined behavior rather than pay the cost to handle them. "Here's how to avoid them. So... avoid them."

A fourth issue is that multi-core computing is much more common and important than it was in the past. OpenGL has never had in interface to issue draw command from multiple threads of a single process. D3D11 had an interface to record commands on multiple threads and dispatch them on a primary thread, but the consensus is that D3D11's implementation did not work as well as was expected in practice.

Mantle, Metal and DX12 all have new, multi-threaded interfaces that they are quite confident will work well in practice.

Much of what I'm describing here is covered in this presentation from Microsoft "DirectX 12 API Preview" https://www.youtube.com/watch?v=m0QkjKGZQzI

An alternative approach has been proposed by a multi-vendor group of OpenGL driver developers. It was presented in the "Approaching Zero Driver Overhead" (AZDO) talk at GDC 2014. http://gdcvault.com/play/1020791/ and https://www.khronos.org/assets/uploads/developers/library/20...

In the AZDO approach, instead of tossing out the legacy state machine of OpenGL, they demonstrate how some current (fairly cutting edge) features that have recently been added allow a draw state to be set up that is so expressive and so extensive that it can pretty effectively represent a whole, fairly complicated scene of a modern game in a single draw state. Once you set this up, you can pretty much issue a single request to draw much-if-not-all of the current frame as an atomic operation. Further, common frame-to-frame modifications (such as moving objects around) are very cheap in this setup.

ADZO is an interesting and perfectly workable approach. I am less of a fan of that approach than I am the DX12 approach.

I should make this into a blog post... I should start a blog...

0 comments

3 comments · 2 top-level

przemo_li11y ago· 1 in thread

AZDO is not about "single draw per frame" nor "single draw per scene".

Its "single draw per timeframe needed for switching state".

Difference bing that modern GPUs can "hide" state change behind big enough workload.

Also OpenGL as is right now, allow for explicit GPU/CPU synchronization. Multi threaded content creation (without needing any explicit api for it).

What it lacks are: * Requirement for caching shaders. (Quality of implementation) * Requirement for offline shader compilation. (Quality of implementation) * Intermediate representation (so that mundane task like elimination of dead code happen ahead of time) (Specification) * Having all that good stuff in core (Specification) * App devs moving to Core Profile (thats us...)

So OpenGL as is now, is quite close to solving all Your problems. And it do it somehow-less-somehow-more explicit then DX12/Mantle (as those focus on exposing CPU/GPU intensive operations, while OGL "AZDO" go ahead and propose solution to solve GPU/CPU bottlenecks)

przemo_li11y ago

In that sense You should add disclaimer at the beginning of this blog post ;), that by "OpenGL" You mean both OpenGL without AZDO extensions and OpenGL ES.

pavlov11y ago

I should make this into a blog post... I should start a blog...

You definitely should! This comment was great reading. You managed to condense a very complex state of affairs into an understandable explanation.

j / k navigate · click thread line to collapse