Beware Using Telemedicine for Voice and Speech Therapy
Popular teleconferencing platforms—Zoom, Webex, Microsoft Teams, Doxy.me, and VSee—pose challenges for effectively evaluating treatment
As a result of the coronavirus pandemic, people across the world have experienced how teleconferencing platforms like Zoom help folks stay connected—playing games with friends, hosting virtual weddings, and even visiting a doctor. But when it comes to telemedicine, not all medical care is easily translated to a remote format.
In a virtual world, voice therapy presents a unique challenge because clinicians must rely on acoustic recordings of voice to evaluate the effectiveness of their treatments. But many teleconferencing platforms distort sounds in their efforts to eliminate background noise.
Boston University graduate researcher Hasini Weerathunge wanted to find out if popular teleconferencing platforms used for telemedicine could capture sounds accurately enough for clinicians to successfully treat and evaluate patients with voice and speech disorders. Weerathunge, a graduate student fellow at BU’s Rafik B. Hariri Institute for Computing and Computational Science & Engineering and a PhD candidate in biomedical engineering, does research in the lab of Cara Stepp, a College of Health & Rehabilitation Sciences: Sargent College associate professor of speech, language, and hearing sciences.
“Although the COVID-19 crisis appears to be waning, telepractice popularity is here to stay,” Stepp says.
Weerathunge and Stepp teamed up with other BU researchers to put five different HIPAA-compliant teleconferencing platforms to the test: Cisco Webex, Microsoft Teams, Doxy.me, VSee Messenger, and Zoom.
As the pandemic unfolded and lockdowns moved much voice and speech therapy online, “there was no consensus among [voice and speech] clinicians [who were] trying to convert to telepractice therapy, and we wanted to determine the accuracy of the acoustic measures they can get through telepractice,” Weerathunge says.
Although voice therapists had sometimes conducted telepractice sessions with patients before the pandemic, evaluations of the effectiveness of treatment were always carried out in person. During that process, a patient goes into the clinic and sits in a soundproof booth outfitted for speech recordings. The patient repeats sustained vowel sounds, like “aaa” or “ooo,” or reads a short passage that reflects a wide variety of sounds and mouth movements in the English language. The recordings of the patient’s voice are then evaluated by algorithms that measure acoustic properties, including the acoustic correlates of perceived pitch and loudness of the voice.
In-person voice evaluations came to halt, however, at the height of the COVID-19 pandemic. Voice evaluations moved to a virtual format, but until now, the accuracy of those evaluation procedures done online has never been examined.
In a soundproof room, the team recorded voice samples from 29 patients, aged 18 to 82, that had a variety of speech or voice diagnoses. These recordings were then played back to researchers through an external speaker over the teleconferencing platforms, simulating telepractice conversations.
The team quickly learned that each platform has its own audio enhancement algorithm that affects the quality of the sound. Zoom was the only platform that enabled users to turn off these audio enhancement features, allowing the researchers to test the platform’s original audio.
Despite the enhancements, the team predicted the ability to measure vocal fundamental frequency (pitch) and vocal intensity (loudness) through teleconferencing platforms is not significantly affected.
But the researchers discovered that all the teleconferencing platforms did a poor job at capturing many measurements needed for accurate and clinically meaningful voice evaluations. Pitch varied significantly on all the virtual platforms compared to the real-life recordings. This might be due to internet connection or bandwidth issues that affect how and when sounds get transmitted through the platforms, the researchers say.
They also found the dynamic range of the vocal loudness measured over telepractice was very different from live recordings. “This was the biggest surprise for us,” Weerathunge says. The effect was even true for Zoom, where the researchers could turn off the audio enhancements.
Overall, “Microsoft Teams performed the best, in that all our voice measures were the least affected in that platform,” Weerathunge says.
Because many of the voice metrics collected from virtual platforms had clinically significant differences from those collected in person, Weerathunge and the team urge caution for voice and speech therapists using telepractice.
“This work is likely to have substantial impact on clinical practice, providing crucial information about the effects of these telepractice platforms on clinical voice evaluations,” Stepp says.
This work was supported by a grant from the National Institute on Deafness and Other Communication Disorders, a COVID-19 pilot grant from the National Center for Advancing Translational Sciences via BU’s Clinical & Translational Science Institute, and a Graduate Student Fellowship from the Rafik B. Hariri Institute for Computing and Computational Science & Engineering.