For anyone interested: I have a detailed write up on the topic here in the context of 1990s renderers https://github.com/sylefeb/tinygpus/tree/main?tab=readme-ov-... ; a video discussing texture mapping with a hardware twist https://youtu.be/2ZAIIDXoBis?si=MvQXH2ltqWmvFMdt&t=1072 ; and have a shadertoy to compare perspective correct texture mapping on/off https://www.shadertoy.com/view/ftKSzR
Seriously, this keeps getting overhyped as some gigantic insight when it was really just a consequence of the Pentium having been released in 1993. And with the Pentium, you got both reliable FPU availability (none of the 486SX pain), and the cycle count for FDIV dropped by almost 50% (73->39 IIRC)
Everybody doing 3d gfx knew you needed a perspective divide and was looking at ways to do that cheaply. Interpolation + a long-latency instruction that doesn't block the main pipelines is a fairly straightforward answer.
The proper solution is this: https://gamedev.stackexchange.com/a/63203
3D graphics is a rich and old field with lots of tricks like this. It's cool to see faster algorithms and alternative explanations in the comments!