The Emotion Engine (CPU) to GS (GPU) link was what made the PS2 so impressive for the time, but it also made it somewhat hard to code for and immensely hard to emulate. If I recall correctly, the N64 has something like 4x the memory bandwidth (shared) of the PS1, and the PS2 had roughly 6x (3GB/s) the system bandwidth of the N64. However, the PS2's GS RAM clocked in at 48GB/s, more than the external memory bandwidth of the Cell (~25GB/s), which meant that PS3 emulation of PS2 games was actually done with embedded PS2 hardware.
It was a bonkers machine. I don't think workstation GPU bandwidth created 50GB/s for another 5-6 years. That said, it was an ultra simple pipeline with 4MB of RAM and insane DMA requirements, which actually got crazier with the Cell in the PS3. I was at Sony (in another division) in that era. It was a wild time for hardware tinkering and low level software.
That's kinda overselling it, honestly. When you're talking about the GIF, only the VU1's vertex pipeline was able to achieve this speed directly. PATH2/PATH3 used the commodity RDRAM's bus (unless you utilized MFIFO to mirror a small portion of that to the buffer, which was much more difficult and underutilized than otherwise since it was likely to stall the other pipelines); the exact same bus Pentium 4's would use a few months after the PS2's initial launch (3.2-6.4GB/s). It's more akin to a (very large) 4M chip cache, than proper RAM/VRAM.
As to the PS3 being half that, that's more a design decision of the PS3. They built the machine around a universal bus (XDR) versus using bespoke interconnects. If you look at the Xbox 360, they designed a chip hierarchy similar to the PS2 architecture; with their 10MB EDRAM (at 64GB/s) for GPU specific operations.
As to those speeds being unique. That bandwidth was made possible via eDRAM (on-chip memory). Other bespoke designs utilized eDRAM, and the POWER4 (released around the same time) had per-chip 1.5M L2 cache running at over double that bandwidth (100GB/s). It also was able to communicate chip-to-chip (up to 4x4 SMP) at 40GB/s and communicate with it's L3 at 44GB/s (both, off-chip buses). So other hardware was definitely achieving similar to and greater bandwidths, it just wasn't happening on home PCs.
Edit: if memory serves, SPE DMA list bandwidth was just north of 200GB/s. Good times.
As I recall, the partnership between Intel and Rambus was pilloried as an attempt to re-proprieterize the PC RAM interface in a similar vein to IBM’s microchannel bus.
Back then the appeal of console games to me were that beyond a convenient factor, they were also very specialised hardware for one task - running games.
I remember playing FF12 (IZJS) on a laptop in 2012 and it ran very stable granted that was 6 years post release but by then had the emulator issues been fully solved?
Re. Wild time for low level programming I remember hearing that Crash Bandicoot had to duck down into MIPS to eke out every extra bit of performance in the PS1.
I very much enjoyed this video that Ars Technica did with Andy Gavin on the development of Crash Bandicoot: https://www.youtube.com/watch?v=izxXGuVL21o
The most “famous” thing about Crash programming is probably that it’s all in lisp, with inline assembly.
At my last job, we had ASICs that allowed for single-sample audio latency with basic mixing/accumulation functions for pulling channels of audio off of a bus. It would have been tragically expensive to reproduce that in software, and the required hardware to support a pure software version of that would have been ridiculous.
We ended up launching a new platform with a different underlying architecture that made very different choices.
But for more reading https://www.psdevwiki.com/ps2/Graphics_Synthesizer
The author has a bunch of other things in their post they don’t expand upon either which are significantly more esoteric as well though, so I think this is very much geared for a particular audience.
A few link outs would have helped for sure.