Maybe? I thought IPA was just phonetic but I see that it does have some optional prosody stuff that could in theory cover some of it. I'm not sure how standard or complete it really is in practice.
I haven't heard of any large datasets of IPA transcripts of speech with the detail necessary to train a fully realistic STT->LLM->TTS system. If you know of some that would be interesting to look at.