Published: January 21, 2021
This is an excerpt of an article that appeared at livetranscribe.medium.com. The authors are Chet Gnegy, Sagar Savla and Dimitri Kanevsky from Google.
It has been nearly two years since we launched Live Transcribe. In that time, we’ve heard lots of great stories from people about their creative uses and applications to get access to free real time transcriptions in 80+ languages. We’ve also done lots of experiments with transcription on both phones and tablets and recently published a paper.
Even though the app looks very simple, there are still plenty of best practices for getting good results. In this guide, we share some of what we’ve learned in our journey of trying to make the world’s sounds and conversations more accessible. In this guide, we will mostly discuss acoustics and external microphones. Our guiding principle is to get the best audio signal possible, usually by moving the microphone closer to what you are trying to transcribe.
Two friends, Moose and Raven, like to talk to each other using Live Transcribe.
There will always be environments in which getting accurate transcription is challenging, but fortunately the job is a lot easier if you follow one rule: move the microphone closer to the speaker! (We will use “loudspeaker” when we mean the electronic device and “speaker” when we mean a human who is talking!)
The two biggest challenges with audio transcription are noisy rooms and reverberant spaces. In a noisy room, it might be very challenging to hear and transcribe the speaker because of the loud sounds of people and objects in the background. In a reverberant space (the inside of a cathedral, for example), the app might be able to hear the speaker well enough, but there might be too much echo to understand them properly.
While exploring a cave, Moose and Raven discover that due to lots of echoes, Live Transcribe just isn’t working. Moose can yell as loud as she wants, but they might need to move closer together for Live Transcribe to understand.
Both of these challenges are helped by getting the microphone closer to the speaker’s mouth (see external mics below). When possible, stepping outside the room can sometimes be a good solution. Scenarios like church services where the speaker is speaking into an amplification system might be helped by sitting near the loudspeaker or putting a wireless microphone near the podium.
Moose and Raven attend a meeting at the local church. Moose brings her Bluetooth microphone so that Sloth can wear it at the podium. Even though the acoustic environment is very challenging, Live Transcribe is working great and Moose can understand.
As a rule of thumb, Live Transcribe should not be expected to perform well on phone calls and diagnosing quality issues can be difficult. We’ve seen it work, and you might too, but due to the audio processing used in network transmission, the signal picks up a lot of distortion and loses a lot of its high and low frequency content.
Raven prefers to use captions when he watches television. For shows that don’t have captions, Raven sets up Live Transcribe on a tablet right next to the TV’s loudspeaker (instead of holding the device). Raven can read the largest font setting from across the room.
Especially for one-on-one conversations, using a wired or wireless microphone can make a huge difference.
For those with USB C ports on their phones, mount the Comica CVM-VS09 directly on the phone. It is a “shotgun” style directional microphone meaning that it amplifies sounds in the direction you point it and suppresses sounds in other directions. This can be helpful for aiming the mic at a speaker, even if they are across the table.
There are other external microphone solutions that support more than one microphone, such as the Samson Go Mic. The Samson mic also supports two wireless mics that lead to great quality for a three-person conversation. However, it uses handheld mics and is a very bulky solution. It also isn’t scalable beyond two mics.
Using Live Transcribe directly with any external audio source
The 3.5mm aux jack of the Saramonic UTC-C35 goes into your audio output device (source) and the USB-C adapter goes into your phone or tablet with the Live Transcribe app. Then in the Live Transcribe app, just select the external microphone.
If you want to transcribe a phone call, a video call, or any stream of audio from another device to a device with Live Transcribe, we found this special adapter by Saramonic UTC-C35 to be a simple reliable solution.
A great aspect of this solution is that the audio gets sent directly from the source, avoiding the reverb and noise that your room might have. This usually leads to much better transcription.
An example setup, with a phone and a laptop, would be: A video call playing on the laptop -> UTC-C35’s 3.5mm aux jack plugged into the laptop -> UTC-C35’s USB-C jack plugged into the phone -> Live Transcribe app on the phone.
Once you plug in the Saramonic UTC-C35 to a phone running Live Transcribe, it shows up as an external microphone that you can select in the Settings.
Hopefully with these tips you’ll be on your way to having better conversations with Live Transcribe.
This is an excerpt of an article that appeared at livetranscribe.medium.com. Please visit the full article for additional information on using Live Transcribe.