Subtitling Your Life – Anupam Goel

Hearing aids and cochlear implants have been getting better for years, but a new type of device—eyeglasses that display real-time speech transcription on their lenses—are a game-changing breakthrough.

Excerpt – free voice-to-text app on his [David Howorth, a person with multiple causes of hearing loss] phone, Google Live Transcribe & Notification. When someone speaks to him, he can read what they’re saying on the screen and respond as if he’d heard it. He belongs to a weekly lunch group with half a dozen men in their seventies and eighties, and when they get together he puts his phone in the center of the table and has no trouble joining in. Live Transcribe makes mistakes—“One of the guys, a retired history professor, said something that it transcribed as ‘I have a dick,’ ” Howorth told me—but it’s remarkably accurate, and it punctuates and capitalizes better than many English majors I know. It can also vibrate or flash if it detects smoke alarms, police sirens, crying babies, beeping appliances, running faucets, or other potentially worrisome sound emitters, and it works, with varying degrees of accuracy, in eighty languages. Howorth remarried a few years ago; his current wife, whose name is Sally, never knew him when he had two good ears. He used Live Transcribe at a party they attended together, and she told him afterward that it was the first time she’d been with him in a social setting in which he didn’t seem “aloof and unengaged.”

A researcher I interviewed in 2018 told me, “There is no better time in all of human history to be a person with hearing loss.” Nearly every expert I spoke with back then agreed. They cited over-the-counter hearing devices, improvements in conventional hearing aids and cochlear implants, and drugs and gene therapies in development. Those advances have continued, but, for Howorth and many others with hearing problems, the breakthrough has been acquiring the ability to subtitle life. “It’s transcription that has made the difference,” Howorth told me. The main contributor has been the tech industry’s staggering investment in artificial intelligence. Live Transcribe draws on Google’s vast collection of speech and text samples, which the company acquires by—well, who knows how Google acquires anything?

Back in the days when software came on disks, I bought a voice-to-text program called Dragon NaturallySpeaking. I had read about it in some computer magazine and thought it would be fun to fool around with, but I had to train it to understand my voice, using a headset that came with the disk, and even once I’d done that it was so error-prone that correcting a transcript took longer than typing the entire text would have taken. Now there are many options (among them the modern iteration of Dragon). The dictation feature in Microsoft Word works so well that a writer I know barely uses his keyboard anymore. Howorth and I sometimes play bridge online with two friends. The four of us chat on Zoom as we play, and if I didn’t know that he couldn’t hear I would never guess. Zoom’s captioning utility shows him everything the rest of us say, identified by name, and he responds, by speaking, without a noticeable lag. The app even ignores “um”—a feature that I had trouble explaining to Howorth, because Zoom left it out of my explanation, too. [..]

He [Yale senior Madhav Lavakare] built a crude prototype, which he continued to refine when he got to Yale. Then he took two years off to work on the device exclusively, often with volunteer help, including from other students. He’s now twenty-three, and, to the relief of his parents, back in college. Not long ago, I met him for lunch at a pizza place in New Haven. He had brought a demo, which, from across the table, looked like a regular pair of eyeglasses. I’m nearsighted, so he told me to wear his glasses over my own. (If I were a customer, I could add snap-in prescription inserts.) Immediately, our conversation appeared as legible lines of translucent green text, which seemed to be floating in the space between us. “Holy shit,” I said (duly transcribed). He showed me that I could turn off the transcription by tapping twice on the glasses’ right stem, and turn it back on by doing the same again. He added speaker identification by changing a setting on his phone. The restaurant was small and noisy, but the glasses ignored two women talking loudly at a table to my left.

Lavakare’s company is called TranscribeGlass. He has financed it partly with grants and awards that he’s received from Pfizer, the U.S. Department of State and the Indian government, programs at Yale, and pitch competitions, including one he attended recently in New Orleans. His glasses require a Bluetooth connection to an iPhone, which provides the brainpower and the microphone, and they work best with Wi-Fi, although they don’t need it. You can order a pair from the company’s website right now, for three hundred and seventy-seven dollars, plus twenty dollars a month for transcription, which is supplied by a rotating group of providers.

Not long after our lunch, I had a Zoom conversation with Alex Westner and Marilyn Morgan Westner, a married couple whose company, XanderGlasses, sells a similar device. Alex was a member of the team that developed iZotope RX, a software suite that has been called “Photoshop for sound,” and Marilyn spent six years working at Harvard Business School, where she helped build programs on entrepreneurship. In 2019, they decided to look for what Alex described as “a side hustle.” They settled on helping people with hearing loss—which, according to the National Institutes of Health, affects roughly fifteen per cent of all Americans over the age of eighteen—by creating eyeglasses that would convert speech to text. (They found Lavakare through a Google search; the three keep in touch.)

XanderGlasses are fully self-contained. That makes them heavier, more conspicuous, and significantly more expensive than Lavakare’s glasses, but it also makes them attractive to those who lack phones or access to the internet, a category that includes many people with hearing problems. (XanderGlasses are able to connect to Wi-Fi when it’s available.) The Westners have worked closely with the Veterans Health Administration. Two of the V.A.’s most common causes of service-related disability claims involve hearing: tinnitus, or phantom sounds in the ears, which accounted for more than 2.3 million paid claims in fiscal year 2020; and hearing loss, which accounted for more than 1.3 million during the same period. [..]

Hearing difficulties pose challenges throughout the health-care system, even when the primary medical issue has nothing to do with ears. Older patients, especially, mishear instructions or are too overwhelmed by bad news to listen carefully. Kevin Franck, who was the director of audiology at Massachusetts Eye and Ear between 2017 and 2021, instituted a pilot program in which Massachusetts General Hospital issued inexpensive personal sound-amplification products to patients with unaddressed hearing loss; medical personnel were also reminded to do things like turn off TV sets before asking questions or explaining procedures. He told me that the medical profession still resists captioning technology, primarily out of fear that transcription errors could lead to misunderstandings that result in lawsuits. “Nevertheless,” he continued, “I always urged my clinicians to suggest that patients download one of the apps on their own phone anyway,” and to encourage patients “to check it out for themselves and to use it for more than that day’s appointment.”

Full article, D Owen, The New Yorker, 2025.4.21