Quick post on something other than the ESRC, for a change…..
Transcription is a major category of expense for social science research projects, and I’ve been wondering for some time whether it’s possible to make cost savings without sacrificing accuracy, consistency, confidentiality, speed of turnaround, and all of the other things we require.
One problem is that there seem to be a wide variety of different pricing models. Some by hour of tape, some by hour of staff time, some by some other smaller unit of time. Another is that there are different types of transcription – verbatim (which includes every last hesitation and verbal tic) and then varying degrees of near-verbatim stuff. Some transcription is of fairly straightforward one-on-one interviews, but sometimes it’s whole focus groups or meetings where individual speakers need identifying. The quality of the recordings and the clarity of those speaking may be variable. I’ve also been assured that there are cases where a Research Associate with specialist knowledge (rather than a generalist audio typist) is required, though that was for a video recording.
I imagine there are plenty of models of sourcing transcription across universities – in house capacity, a list of current/former staff looking for extra work, or a contract with a preferred supplier. Or some kind of mixture of provision. One option would be to look at getting better value, but given the difficulty in comparing price and quality, I’m not sure how far this would get us. I’m also a little unhappy at the thought of trying to reduce what I suspect are already fairly low rates of pay.
I wonder if technology has reached a point where it would be worth looking seriously at voice recognition software for producing a first pass transcript. At least for non-verbatim requirements, this might produce a document that would just need correcting and tidying up, which might be quicker (and therefore cheaper) than transcribing the whole thing. However, I can’t help remember an episode when a friend tried voice recognition software which couldn’t cope with his Saarrf Lahndahn accent… which got more pronounced the more frustrated he got with its utter failure to anderstan’ wot ee waz sayin. But I’m sure technology has moved on.
The ever-reliable Wikipedia reckons that 50% of live TV subtitles were produced via voice recognition as of 2005, though there’s a “citation needed” for this claim. But even if true, I would imagine that a fair amount of speech on live TV is more scripted and rehearsed – and therefore easier to automatically transcribe – than what someone might say in a research interview. More RP accents, too, I’d imagine.
Anyone have any experience of using voice recognition software for transcription? Or is the technology not quite there yet?