undefined | Better HN

0 pointsartimaeis6y ago0 comments

> ML multi-speaker speech-to-text every conversation

Neat idea, do you know of any software that's capable of taking an audio file and producing multi-user text from it? Seems like it would be useful in a wide variety of situations.

0 comments

5 comments · 3 top-level

rckoepke6y ago· 2 in thread

Trint[0] has a wonderful UI/UX for exactly this. I'm not sure if they're using the latest & greatest ML models, but even years ago I was pretty blown away. I still am. It's one of the most "right" products I've seen in this generation of web development. Someone realllllllly cared about the details of UX.

Some of that only comes across when you actually use it - when you clean up the transcription immediately after the meeting or the next day. Clicking a mistake word to edit it snaps the video and audio to that point, so its super intuitive to "scrub" through the video just by clicking around the text transcription. Very fast, very natural, very low effort.

I can only imagine how much it will be improved if it used google's newest multi-speaker transcription models. It always had some trouble whenever people started talking at the same time.

[0] https://trint.com

lowdose6y ago

At $60 per user per month?

rckoepke6y ago

Yeah it's way too expensive. I decided that it probably technically is net positive for organizations with a heavy billable hours situation (contract engineering shops) because it can easily save more than $60/mo per person of time/accuracy.

That doesn't mean I was willing to pay $60/mo/pp though, and ended up not.

However, I still think it's the best product in its category right now and given the disappointing state of product design these days, I don't expect anyone to catch up to its interface. I'd love if an open-source group did though.

lowdose6y ago

Google Speech API see other comment for link has speaker diarization in beta i.e automatic predictions about which of the speakers in a conversation spoke.

On top of this you can add 5000 names for company specific entity recognition for product, people, brand names etc.

120 different languages are supported.

stopyellingatme6y ago

This is actually a fairly complex problem. You are getting into the field of stenography.

If such a speech-to-text, multi conversation system existed then the field of court reporting/closed captioning would have a real shake up.

j / k navigate · click thread line to collapse

0 comments

5 comments · 3 top-level

rckoepke6y ago· 2 in thread

I can only imagine how much it will be improved if it used google's newest multi-speaker transcription models. It always had some trouble whenever people started talking at the same time.

[0] https://trint.com

lowdose6y ago

At $60 per user per month?

rckoepke6y ago

That doesn't mean I was willing to pay $60/mo/pp though, and ended up not.

lowdose6y ago

Google Speech API see other comment for link has speaker diarization in beta i.e automatic predictions about which of the speakers in a conversation spoke.

On top of this you can add 5000 names for company specific entity recognition for product, people, brand names etc.

120 different languages are supported.

stopyellingatme6y ago

This is actually a fairly complex problem. You are getting into the field of stenography.

If such a speech-to-text, multi conversation system existed then the field of court reporting/closed captioning would have a real shake up.

j / k navigate · click thread line to collapse