I agree. The convert to text part was about indexing the content of the calls for easy searching.
I bring it up because speech recognition has become so commoditized that most of us could think of a way to whip up an, albeit bad, solution to this problem using AWS/GCP/etc in a weekend.