Four voice digi player using SWAR in second version https://www.youtube.com/watch?v=1GrdvcghDXE
Reminds me of a weekend hobby project I did back in 2014 or so. I had an itch to play with analog video signal generation from an atmega328p. Rather than use one of the existing libraries, though, I started from scratch with the goal of achieving the highest possible resolution. I used the SPI peripheral to clock out 8 pixels at a time at 8Mhz without any gaps, giving me something like 12 instructions to prepare the next byte. There wasn't enough RAM for a frame buffer at the resolution, so I instead used character tiles; that ate up the whole budget. I forget what the resolution was, but it was significantly higher than the existing library was capable of. There was a jitter, which I tracked down the the variability in interrupt latency due to the AVR having variable cycle length instructions. I was using a timer interrupt to schedule the start of and complete transmission of each scanline, so that the main program could focus purely on application logic. I wrote an inline assembly routine at the start of the interrupt handler to insert a variable number of noop instructions depending on the relative phase of the hardware timer, and the output became rock solid.
That of course reminds me of a project in 2007 where I needed to go the other direction, and decode an analog video signal on an 8 bit PIC microcontroller. The signal was from a camera on an actuator, meant to detect the relative position of the sun for the purpose of aiming a parabolic solar concentrator. I was able to filter out all visible light with some overdeveloped film negative so that the video signal was simply a white dot on a black background, and then wire it up through some voltage dividers to the PIC's two voltage comparators. One comparator detected sync pulses, and the other one detected black to white transitions. The firmware would simply track the timing of sync pulses to know the current scanline and position within the current scanline. Good times!
Some more info on the digi player on the ST. It used a timer interrupt to service the PCM sample but if you used just the interrupt there was significant noise because there was significant variability of the timing on the interrupt. To get the timing tighter the interrupt timing was changed to hit the routine on every video line just prior to the actual hsync and then hsync was polled to get very precise timing.
The PCM was just a linearization by combining three logarithmic volumes of the three PSG voices.
During the title sequence, a special version of the code was running where several 68000 registers were reserved globally for the digi player. So those did not need to be saved / restored in the interrupt routine!
The SWAR was involved when advancing the four wrapping 8 bit indices into the each of the four voice's 256 entry sample tables. This was part of a monumental effort to get the interrupt routine to be as quick as possible.
Your video decoding reminds me of when I worked at a video card company in the nineties we had a competitive advantage by using a commodity part in an unusual way. This video decoder hardware was commonly used to take composite video and decode it. We supported that but we also did a bunch of advanced features by using a seldom used mode where it could be used in a raw mode where it took the composite signal and stored the analog to digital conversion in memory. We had high speed assembly code that could decode the video better than the hardware and supported some cool additional features. Anyway... memories a bit hazy. Been a while but I remember it being very cool.