My method of downsampling was complicated. Since the creation of the TAS was being automated, and since I also needed to stream in graphics data occasionally, I ran into the issue of needing to know exactly what byte to read from the .wav file at any given moment. I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
To be completely honest, this project was my first time directly reading the contents of a .wav file like this, and I had no prior experience writing code for audio conversion or playback. If I were to do this project again, I'd look into noise dithering + noise shaping, as well as filtering methods. I know at the very end of the TAS, there's certainly some weird audio artifacts that I couldn't figure out how to fix at the time.