undefined | Better HN

0 pointsanigbrowl3y ago0 comments

By the time you're prosecuting someone in court, yes of course you double, triple, quadruple check everything. That's why lawyers get paid the big bucks (for now...). But yes you can identify which content probably has errors and flag it as such.

Look, I have decades of experience dealing with human speech, and not just as an editor - I can trace the human voice from neural impulses in Broca's region through the physiology of vocal production, mechanical transduction into electrical signals, discrete fourier transforms of the resultant waveforms into spectral information and back again, the reproduction of altered signals from time-aligned speakers to create a sense of spatialization, how those are processed in the human ear, and how the cilia are connected by nerves back to your brain. I'm a good enough editor that I can recognize many short words by sight of a waveform, or make 10 edits in a row by sight and know it will sound good on playback.

So when I say that machine transcription is as good as human realtime transcription now, I say so with the clear expectation that those decades of craft are very close to being rendered obsolete. I absolutely expect to hand off the mechanical part of editing to a machine within 2 years or so. It's already at the stage where I edit some interviews as text, like in a word processor, and then export the edited document as audio and it's Good Enough - not for every speaker, but more than half the time.

NPR and a lot of commercial broadcasters cut their material this way already, because you can get the same result from 30 minutes of reading and text editing that would require 3 hours of pure audio editing with no transcription.

0 comments

frognumber3y ago

What tools do you use to do this? I once hacked together an editor like this maybe a decade ago -- edit speech as text from OCR -- and sorely need one now.

Alignment of video to text is a big problem for me too.

boundlessdreamz3y ago

This can be done via https://www.descript.com/ You can edit video/audio by editing the transcript.

You can even add/modify words that weren't originally there https://www.descript.com/overdub

frognumber3y ago

Thank you!

yourapostasy3y ago

> So when I say that machine transcription is as good as human realtime transcription now...

Would you go as far as to assert machine transcription can be used as an objective benchmark of a speaker’s verbal legibility?

It is fraught with political and interpersonal dynamics to approach someone even privately one on one today and gently suggest their career would get a huge boost if they hired a voice coach to help improve their verbal communication delivery. So even when I don’t directly mention their accent, it becomes a very sensitive subject with many.

However, if audio professionals like you can point to a system and say the raw biomechanics and acoustic physics of the world dictate that this is as physically and psychometrically good as audio parsing of human speech gets regardless whether the system was biologically evolved or ML evolved, the conversation can be couched even more objectively.

I enable recording and voice transcription in every meeting I can (ostensibly for DE&I but really for my own selfish purposes), and already observe in myself I have to work hard to overcome a tendency to gloss over speakers who don’t transcribe well when I review meeting transcripts to jot down any key information I might have missed taking notes upon during the meeting.

Note that I’m perfectly aware that my foreign language verbal skills are nowhere near the English skills of those I have tried to help. If the lingua franca of the coding world switched to Urdu tomorrow, then I’d hire help to learn and polish my spoken Urdu, like I went to a speech coach when learning public speaking because I can always use help in the many skills I lack.

j / k navigate · click thread line to collapse

0 comments

frognumber3y ago

What tools do you use to do this? I once hacked together an editor like this maybe a decade ago -- edit speech as text from OCR -- and sorely need one now.

Alignment of video to text is a big problem for me too.

boundlessdreamz3y ago

This can be done via https://www.descript.com/ You can edit video/audio by editing the transcript.

You can even add/modify words that weren't originally there https://www.descript.com/overdub

frognumber3y ago

Thank you!

yourapostasy3y ago

> So when I say that machine transcription is as good as human realtime transcription now...

Would you go as far as to assert machine transcription can be used as an objective benchmark of a speaker’s verbal legibility?

j / k navigate · click thread line to collapse