> with the smaller CPUs you can lookup exactly how long each instruction will take and thus get the timing right
Reminds me of the good old demo scene hack to get 18-bit color[1] back in the VGA days (and probably earlier) by changing the color palette per pixel through careful timing of the instructions.
IIRC you ignore the pixel values entirely, writing effectively a single "color", ie palette index, and then just change that single palette entry per pixel, timed so it's ready just when the output DAC is clocked to the next pixel.
edit: I forgot which demos I've seen this effect in, but searching for it led me down a marvelous rabbit hole where I stumbled over someone[2] managed[3] to get 1k colors over CGA[4], apparently discussed here[5] a couple of years ago.
[1]: https://en.wikipedia.org/wiki/Video_Graphics_Array#Color_pal...
[2]: https://int10h.org/blog/2015/04/cga-in-1024-colors-new-mode-...
[3]: https://www.reenigne.org/blog/1k-colours-on-cga-how-its-done...
[4]: https://en.wikipedia.org/wiki/Color_Graphics_Adapter
[5]: https://news.ycombinator.com/item?id=21992870