The real fun is optimising maths. Remove all divisions. Create LUTs, approximations, CPU specific tricks. Despite the fact CPUs are magnitudes faster now, they are still slow for real time processing.
If you have a buffer that's being clocked out and your goal is to keep data flowing, the jitter is going to influence how small your buffer can be. Let's say you're producing 56Khz audio, the best you can do is produce a [sample] exactly at that frequency. If you have 1ms jitter now you need a 1ms buffer so you have delay. If jitter is small enough, like 0.1ns jitter in some SIMD calculation, then for all intent and purpose it doesn't matter for an audio application...
If there are complex equations involved, it absolutely is faster. You can also create intermediate LUTs, so the tables are small and fit in cache and then do interpolation on the fly.
Yeah, isn’t hitting memory (especially if it can’t fit in L1-2 cache) one of the biggest sources of latency? Especially that on modern CPUs it is almost impossible to max out the arithmetic units, outside of microbenchmarks?
You don't really do these any more on a modern CPU. This is stuff I used to do 30 years ago and you might still do if you're on a micro-controller or some other tiny system. The CPUs aren't slow. Tne main problem is if the OS doesn't schedule your process it doesn't matter how fast the CPU is.