Sure, but that seems like it'll be a distinction without a difference for many use cases.
Having a reliable emotional model of a person based on their voice (or voice + appearance) can be useful in a thousand ways.
Which seems to represent a new frontier.