>
We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.So they're using the same GPT4 model with a relatively small improvement, and no voice whatsoever outside of the prerecorded demos. This is not a "launch" or even an announcement. This is a demo of something which may or may not work in the future.