There is a great resource for learning this stuff - the CEO of Daily, Kwindla Kramer, hosted a series of 1hr sessions on low latency voice ai. Here:
https://youtube.com/playlist?list=PLzU2zoMTQIHjMPZ-OnpC3ozZs...
Some of this is a bit outdated but most of it is very valuable.
Kwindla posts a lot of extremely useful stuff on x and linkedin, incl. working, easily replicable sub 500ms setups.