------

Departments

News & Features

Arts

Research Briefs

In the News

Bulletin Board

Health Matters

BU Yesterday

Contact Us

Calendar

Jobs

Archive

 

 

-------
BU Bridge Logo

Week of 2 October 1998

Vol. II, No. 8

Feature Article

Support for ENG prof's speech projects comes through loud and clear

By Eric McHenry

For all their merits, personal computers are not especially understanding. Carol Espy-Wilson, ENG assistant professor of electrical and computer engineering, wants to do something about that.

This year, Espy-Wilson has received three grants, totaling more than $700,000, to support her development of technologies that facilitate communication. These include a device that will increase the clarity of speech produced through artificial larynxes and one that will help computers better process oral commands. Speech recognizers for computers are on the market already, says Espy-Wilson, but they are a lot more finicky than the one she hopes to design.

"Right now you have to train your recognizer -- read it some text so that it can get used to your voice," she says. "And then it works for you, but it doesn't work for the next person who comes along. It might not even work for you if you catch a cold. The challenge is to make one that's speaker-independent, that's able to recognize everybody."

Espy-Wilson is at work on an interpreting component that can remove what she calls "extralinguistic information" in order to feed the computer a preprocessed "knowledge-based speech signal representation." It will be the front-end piece of a sophisticated recognizer she is developing in conjunction with scientists at the Massachusetts Institute of Technology. Her portion of the project is underwritten by a three-year $200,000 grant from the National Science Foundation.

Two other substantial awards, both from the National Institute of Deaf and Communicative Disorders, a subdivision of the National Institutes of Health, will provide further support for Espy-Wilson's work in the coming years. One, an Independent Scientist Award of over $330,000, is an investment in Espy-Wilson and her entire research program. The five-year grant allows her to devote approximately three-fourths of her professional time to research projects. The other is a two-year award earmarked for the development of a device to enhance artificial larynx speech. Its allotment to Espy-Wilson is just under $175,000, her portion of a Small Business Innovative Research (SBIR) grant she shares with a local corporation called Speech Technology and Applied Research (STAR). The grant's designation is Phase II; it is meant to extend and fulfill research Espy-Wilson and colleagues initially undertook several years ago.

Carol Espy-Wilson (third from left) reviews data with her research associates. From left, Pelin Demirel (ENG'00), Zach McCaffrey (ENG'99), Joel MacAuslan, president of the Speech Technology and Applied Research Corporation, Venkatesh Chari (ENG'92), ENG visiting scientist, and STAR representatives Karen Chenausky and Jiahong Juda. Photo Kalman Zabarsky


"In any given year there are about 50,000 artificial larynx users in the United States alone," Espy-Wilson says. Generally, these are people who have had their vocal cords removed because of cancer or some sort of degenerative disease. Many communicate with the help of an electrolarynx -- an apparatus with a vibrating head that the user places against his or her neck, exciting the vocal tract to produce speech sounds.

These are inherently somewhat artificial sounding, Espy-Wilson says. "One of the problems with the electrolarynx is that when you're listening to a person who's using one, you're not just hearing the speech signal coming from his or her mouth. You're also hearing a competing sound that's radiating from the device and from the tissues in the area of the neck where the device is placed. Our Phase I grant, which we received in 1995, was spent developing an adaptive filtering technique to remove that unwanted signal."

Espy-Wilson and her research associates, including STAR representatives and BU students, can with the much larger Phase II SBIR grant turn their attention to other problems associated with artificial larynx speech.

"One of the things we're looking at now is trying to put jitter back into speech," says Espy-Wilson. "A problem with these artificial devices is that they vibrate at the same frequency all the time. But when we speak, there are perturbations in the rate at which our vocal cords vibrate; we call this phenomenon jitter. It occurs naturally, so we're trying to put it into the speech of artificial larynx users to make them sound more natural."

Improving consonant distinction is another of Espy-Wilson's objectives. Because the vibration in artificial larynx speech is not created by airflow, consonants are often weaker. Espy-Wilson would like to give users the power to create, for example, discernibly different voiced and unvoiced plosive sounds (such as the D and T sounds at the beginnings of words).

These technologies, Espy-Wilson says, are meant to improve speech in electronically mediated situations. The adaptive filter to reduce competing noise, along with the devices to restore jitter and render consonant distinction, will ultimately be united in one product -- something unobtrusive that artificial larynx users can attach to telephones or public address systems. Speaking through these media, without the benefit of gestures and facial expressions afforded by close communication, is particularly challenging for many users.

"The idea," she says, "is that they would speak into our device. It would process the signal, removing that undesired noise and restoring those little nuances, and then ship that processed signal out across the telephone line or through the public address system."

Although they are separate projects supported by separate grants, progress Espy-Wilson makes on the artificial larynx aid complements her work on the speech recognizer, and vice-versa.

"This knowledge-based speech signal representation for speech recognition separates the vowels from the consonants, then subdivides the consonants into fricatives, plosives, semivowels, nasals, and so forth," says Espy-Wilson. "Since the enhancement techniques used for fricative sounds will differ from those used for plosive sounds, such a separation will clearly help us with the artificial larynx speech project as well."