However, the reality is far more nuanced than he was letting on. The likes of Kaze Emanuar set the record straight, and actually did a proper deep dive into optimization for the game (and not just by passing a -O2 compiler flag I might add). I'd say take MattKC's channel and content with a grain of salt.
FIXING the ENTIRE SM64 Source Code (INSANE N64 performance)
I even say this as someone who generally isn't a fan of Nintendo from the perspective of their corporate policies. I don't think Nintendo did anything wrong here, especially given how new this system and the brave new world that was 3D gaming in the home was at the time for them.
The technical content is really engaging (the "joy"/"frustration" of digging in some ancient technology for no easily justifiable reason--though in this case...) and there's been a heap of work put into merging the technical deep dive with a cinematic narrative that helps communicate some of the experience of what it feels like when you're three days in trying to find a piece of a puzzle someone probably hasn't cared about in 20+ years. :)
At first I thought no way I'm going to spend almost an hour watching this, but I'm happy I did it haha.
There are many funny gems, like that whiteboard analysis at https://youtu.be/CTUMNtKQLl8?t=2343
// Reimplemented
LONG WINAPI CORKEL32_InterlockedCompareExchange(LONG *dest, LONG xchg, LONG compare)
{
LONG temp = *dest;
Trace(TRACE_FORCE_DONT_PRINT, "InterlockedCompareExchange");
if (compare == *dest) {
*dest = xchg;
}
return temp;
}
Not very interlocked at all :)A proper implementation would use some kind of locking in the worst case, but usually would rely on hardware features to provide this atomicity. For example, on x86 this is done by the CMPXCHG family of instructions.
The mock implementation is not atomic. Another thread can change the value between the comparison and swap and the swap will still happen. As for the fix, there might be an intrinsic for it in the W95 SDK, or worst case an inline assembly implementation using CMPXCHG would do the job.
At one point, I was shouting "gacutil" at the screen, until he got there eventually :-)