The DS GPU and its fun quirks (opens in new tab)

(melonds.kuribo64.net)

136 pointsjustinweiss7y ago24 comments

24 comments

20 comments · 6 top-level

modeless7y ago· 5 in thread

I agree that Vulkan isn't necessary, but neither is OpenGL, or a GPU at all. A software rasterizer would be the way to go. Even a mobile CPU should be able to rasterize 2048 polygons per frame at 256x192 using fixed point math. That's almost Quake 1 levels of graphics, and Quake 1 was software rendered in 1996 on 75 Mhz Pentium 1s.

Edit: Reading other articles on the site, it seems like they started with a software renderer. I wonder why they decided to try OpenGL?

Karuma7y ago

People like to increase the internal resolution to render at 4K or more, which I believe is much faster when using hardware to do it.

modeless7y ago

Wow, yeah, that would be a reason to use hardware. Here's what that looks like: https://youtu.be/tvGVqGQIDGA

Kind of an interesting aesthetic. It looks less terrible than I expected. I think it would be interesting to apply some of the recent machine learning based upsampling techniques on DS games too.

pcwalton7y ago

4K at 60 FPS is basically impossible on CPU.

dleslie7y ago

People like to add shaders to their emulation these days; crt filters, colour adjustment, samplers, etc.

Jasper_7y ago

It should be easy enough to upload the final rasterized texture to the GPU and apply filters there.

1 more reply

Jasper_7y ago· 4 in thread

The DS "GPU" is indeed very bizarre and shares more in common with the GBA 2D rasterizer than a modern 3D GPU architecture. That it's a scanline renderer that can handle quads directly should be a pretty big tell :)

I implemented a cheap subset of it used in Super Mario 64 DS for my online model viewer ( https://noclip.website/#sm64ds/44;-517.89,899.85,1300.08,0.3... ), but implementing all of the quirks and the weird featuresets might be nearly impossible to do in a modern graphics API. 2D rasterizers don't have to be slow (as SwiftShader and ryg show), and you can get the bizarre conditions exactly correct. I'm not sure what a GPU-based implementation would even add.

EDIT: The math to be able to handle the bilinear quad interpolation on a GPU was worked out by reedbeta last year: http://reedbeta.com/blog/quadrilateral-interpolation-part-2/ . That's a big roadblock gone, but there's still a lot of other questionable things.

zamadatix7y ago

Wow, that site is absolutely amazing. Fun seeing the great sea all at once! Do you write/blog about working on that?

Jasper_7y ago

I want to! I have a few articles I've written in the pipeline, but it's always a struggle to find a topic to write about. I did write this article about a very specific effect though: http://blog.mecheye.net/2018/03/deconstructing-the-water-eff...

jchw7y ago

Wow, I really like your website, I was blown away when I realized how many different games are supported. And a lot of them look great. Some of the Splatoon stuff ain't looking quite right on Nouveau/Firefox, lotta z fighting and I don't know what's going on with Inkopolis. Still, super cool.

vanderZwan7y ago

> Let’s now define bilinear interpolation for a quadrilateral exactly the same way, except that instead of four texel colors, we’ll have the four vertices of the quad.

This sounds so ridiculously obvious when reading it that I would be surprised if nobody thought of working this out before. Or is the GPU code really difficult to work out compared to how simple the conceptual approach is?

> By the way, the fact that bilinear interpolation creates quadratic splines along diagonals can be exploited to evaluate splines in a GPU texture unit.

That also sounds very interesting!

pcwalton7y ago· 2 in thread

I wouldn't even try emulating this properly with the standard GPU rasterizer, except as a fallback mode for underpowered systems. It would be fun to try using Image Load/Store and Shader Storage Buffer Objects, though, in OpenGL 4.6. Just bind a framebuffer object with no color buffer and do all your writes using atomic operations to image objects and SSBOs in the fragment shader. The fragment shader interlock extension might be helpful if it's available (note: it's unavailable in Vulkan!) This is similar to how order-independent transparency or voxel rasterization works.

One possibility might be to do two passes: one to build up linked lists of per-fragment data (polygon ID, color, depth, etc.) and a second pass to sort all the linked lists into the proper order and determine a final color. This is the standard order-independent transparency trick.

You could build up tables as well--for instance, you could emulate the "one span per scanline/polygon" behavior by allocating a table of scanlines for each polygon that you fill with the lowest X coordinate for that scanline and discard fragments that don't belong to the triangle contributing the lowest such X coordinate.

I have no idea if this will actually work--if I had to guess I'd put a 50% probability on it not working out at all. The fallback would be a SIMD scanline renderer. The Image Load/Store GPU implementation would be really fun though :)

gmueckl7y ago

I was thinking about actually going straight to compute shaders and forgoing the the normal GPU rendering pipeline completely. The quirks outlined in the article are such a big deviation from normal pipeline behavior that I don't know if it is worth trying to use that at all. Performance should be a no-brainer on any halfway modern system anyway.

pcwalton7y ago

Well, compute shaders have the drawback that you have to know how many work items (in this case, fragments) you need to dispatch in advance. You can figure this out for triangles and quads, but it's a pain, and it basically involves doing the same thing the rasterization hardware already does in software. It's much simpler and faster to just use the rasterization hardware built in to GPUs to dispatch fragment work groups dynamically via a triangle draw call and only override the sample processing step.

1 more reply

izacus7y ago· 2 in thread

DS being the Nintendo DS console GPU. Took me awhile to actually verify that's what they were talking about, especially since it starts with paragraphs about Vulkan.

Andrex7y ago

I was wondering if it was in reference to the Nintendo system too, but then I read the domain.

lostgame7y ago

Ha! The domain is what gave it away for me as well. Perhaps the HN title could be made more clear?

jamesu7y ago· 1 in thread

Makes me wonder if compute kernels would be a better solution if you're trying to emulate all these rasterizer quirks while being fast enough to be playable.

freeone30007y ago

You need to render a few thousand polygons to a 298x196 buffer. Desmume has a software renderer, and it's more than adequate. The trick is being compatible.

lostgame7y ago

At the time of its release, and the launch of certain titles like Metroid Prime: Hunters, and even Mario Kart DS, I was incredibly impressed with the DS’s graphics and performance considering its incredibly limited specs - 512kb texture cache and only 4MB of RAM. I understand it’s incredibly limited resolution and low-quality textures probably contributed to the ability to optimize the system and get such interesting performance out of it.

I truly enjoyed messing around with the home brew dev kits and loved putting my home made demos on my flash cart in high school, though I mostly stuck to 2D demos.

From what I understand, since the GPU is more akin to the GBA’s, and is a scanline-based renderer, does that mean it is more similar in its 3D architecture to, for instance, the Sega Saturn? (Incidentally, my other favourite system to write home brew for.)

The main differential, of course, it’s ability to display native triangles in addition to quads? (The Saturn literally did not have native 3D hardware as we understand it but literally drew thousands of scaled and transformed sprites as quads instead of the triangles we are used to today.)

j / k navigate · click thread line to collapse

24 comments

20 comments · 6 top-level

modeless7y ago· 5 in thread

Edit: Reading other articles on the site, it seems like they started with a software renderer. I wonder why they decided to try OpenGL?

Karuma7y ago

People like to increase the internal resolution to render at 4K or more, which I believe is much faster when using hardware to do it.

modeless7y ago

Wow, yeah, that would be a reason to use hardware. Here's what that looks like: https://youtu.be/tvGVqGQIDGA

Kind of an interesting aesthetic. It looks less terrible than I expected. I think it would be interesting to apply some of the recent machine learning based upsampling techniques on DS games too.

pcwalton7y ago

4K at 60 FPS is basically impossible on CPU.

dleslie7y ago

People like to add shaders to their emulation these days; crt filters, colour adjustment, samplers, etc.

Jasper_7y ago

It should be easy enough to upload the final rasterized texture to the GPU and apply filters there.

1 more reply

Jasper_7y ago· 4 in thread

zamadatix7y ago

Wow, that site is absolutely amazing. Fun seeing the great sea all at once! Do you write/blog about working on that?

Jasper_7y ago

jchw7y ago

vanderZwan7y ago

> Let’s now define bilinear interpolation for a quadrilateral exactly the same way, except that instead of four texel colors, we’ll have the four vertices of the quad.

> By the way, the fact that bilinear interpolation creates quadratic splines along diagonals can be exploited to evaluate splines in a GPU texture unit.

That also sounds very interesting!

pcwalton7y ago· 2 in thread

gmueckl7y ago

pcwalton7y ago

1 more reply

izacus7y ago· 2 in thread

DS being the Nintendo DS console GPU. Took me awhile to actually verify that's what they were talking about, especially since it starts with paragraphs about Vulkan.

Andrex7y ago

I was wondering if it was in reference to the Nintendo system too, but then I read the domain.

lostgame7y ago

Ha! The domain is what gave it away for me as well. Perhaps the HN title could be made more clear?

jamesu7y ago· 1 in thread

Makes me wonder if compute kernels would be a better solution if you're trying to emulate all these rasterizer quirks while being fast enough to be playable.

freeone30007y ago

You need to render a few thousand polygons to a 298x196 buffer. Desmume has a software renderer, and it's more than adequate. The trick is being compatible.

lostgame7y ago

I truly enjoyed messing around with the home brew dev kits and loved putting my home made demos on my flash cart in high school, though I mostly stuck to 2D demos.

j / k navigate · click thread line to collapse