# Transcribing Pre-Recorded Lectures (Speech to Text)



## adam.sn (Feb 7, 2007)

Hi Everyone;

Just wondering if there was an automated solution to transcribing speech to text from a lecture?

I've got a client who needs this done - and although I could hire someone hourly to manually type everything in, I'd rather find a software solution.

Any suggestions?

Cheers
- Adam


----------



## CubaMark (Feb 16, 2001)

*None.*

I'm sure the NSA or CIA or someone has a great product that will do this - but nothing we mere mortals will get our hands on. This comes up for me every few months (!) as professors / conference organizers will ask me this precise question. Often they'll arrange to have conference proceedings recorded with the intent to have it transcribed to text, but without any budget or plan to do it, thinking that there "must" be a technological solution such as Speech-to-Text that one can simply throw the audio files at. 

There are lots of programs (Dragon, ViaVoice, iVoice, etc.) that will do Speech-to-Text, but it's only effective for the user who has trained the software for his/her voice, is using a microphone free of extraneous noise, speaking clearly. Those three characteristics certainly do not apply to any conference environment I've seen, and very few lecture environments.

Sorry to say, Adam, this is a solution that only a human can handle... for now.


----------



## kastytis (Oct 24, 2006)

I'm a court reporter, and clients always think that since they see a microphone running from my computer, that I'm then transcribing instanteously. I am, but I'm doing so from my stenographic typing machine -- that they don't see -- that I'm inputting the testimony from my carbon-based fingers. There is no software program that will provide real-time translation, speech to text, save for the software that recognizes one speaker and one speaker only. Dragon works, but not perfect. And to inform you, those of us court reporters who have switched to real-time voice writing -- not me -- (speaking into a mask that feeds their audio on their computer and then into their specialized reporting software and in theory pops up on the reporter's computer screen as magically transcribed speech to text) -- they've had to spend $6000 plus for the software to do their job.

No easy fix.

Just to give you an idea how difficult speech to text transcription of multiple speakers is, consider how a computer software program would translate homonyms, i.e., words that sound alike: sees, seas, seize, cease -- thee, the -- board, bored -- story, storey -- there's, theirs, they'res -- et cetera. Add in a few speakers with accents, and not just speakers from exotic countries, but speakers removed by a few neighbourhoods away or removed by class and income. Believe me, Hamilton north end (where I live) accents might involve accents of new Portuguese and Italian immigrants, for example; second generation ethnics (me) who might be perceived by someone living a few miles to the west as having an accent when they speak English. Throw all that into the mix and then try to get a software program to dissect, discern, discriminate between a spoken vowel versus a dropped suffix... you see where I'm going?

Speech to text with any reasonable accuracy with multiple speakers, shuffling papers, ambient noise outside the room, such as when the fire alarm goes off or folks are laughing off the debacle of the Leafs' previous night debacle in the hallway or beside you (95 percent plus translation -- and that's being very generous) is virtually impossible these days. And then add in the fact that the English language is a crazy mish-mash of many languages over the course of the last millennium plus: English, French, Germanic, Scandinavian, Celt, internet-speak.

Best way still for speech to text, at the student level, other than hiring a court reporter/stenographer to take your notes, you are best off investing in a gross of Bic pens, notebooks, listening carefully to what the lecturer is saying and staying home on Friday and Saturday nights and studying, studying and studying.

I know you want a lecture transcribed, which, usually is given by a single speaker, but ambient noise, etc., is a pain in the neck... comments by others in the lecture hall a few seats or rows above or below, papers shuffling.

Recently I was given the task to transcribe an arbitration hearing that took place in Paris, France. It was pretty clear audio. However, there were two speakers who sounded identical. I asked my client how do I identify them. I got the reply that it's easy: one is a French speaker, and the other is a French speaker with an Arabic accent. These are educated men, but speaking in English, not their native tongue. Each of them would pronounce the same word differently. Still couldn't figure out who was who, since their voices were so much alike in tone and timbre. It's easy to tell one speaker apart when one sounds like Ah-nold and the other sounds like Pee-wee Herman. Not in this case.

Go with the transcriptionist. There's no easy software fix for this. And believe me, the transcriptionist will go out of his/her mind transcribing something they have no control of how it was recorded. They earn every penny, especially if it's dense technical material they are not familiar with and do not have a trained ear.

Sorry for the long post, but there's nothing in terms of software so far that will substitute the human ear for discerning human speech with any degree of reliable -- 99 percent plus translation. You may google it, you may get pie-in-the-sky promises, but nothing works as well. Just trying to save you the disappointment of broken expectations.

Maybe the lecturer records his/her own lectures and translates them to Dragon? That's an option.

Unless some 14-year-old comes up with a bitchin' speech to text app in the next couple of decades, there's (theirs, they're's?) really no quick fix.


----------



## adam.sn (Feb 7, 2007)

Wow.... Lol, this is actually for a client who wants an hour worth of audio That was recorded in studio turned into an ebook. That's all  but interesting to know how difficult that is to transcribe without having the trained software.


----------



## kastytis (Oct 24, 2006)

adam.sn said:


> Wow.... Lol, this is actually for a client who wants an hour worth of audio That was recorded in studio turned into an ebook. That's all  but interesting to know how difficult that is to transcribe without having the trained software.


In studio recording would, I think, provide a controlled, consistent audio record that would be easily transcribed by a human. Lemme know. Not crazy for this type of work, not soliciting for it, but if under controlled circumstance, in studio, it sounds doable.

Ebook sounds cool.


----------



## bsenka (Jan 27, 2009)

A coworker of mine has an iPad app that he claims does dictation/transcription really well (he uses it for meetings and lectures). I didn't catch what it was called. The only one I see in the Canadian app store is Dragon, but it's got mixed reviews.


----------

