Assistant Professor Shengzhi Zhang, of the Metropolitan College Department of Computer Science, has been awarded a grant in the amount of $98,197 from Cisco as part of a project which will help build better speech recognition systems by improving the ways they deal with misidentified inputs.

Assistant Professor Zhang’s project is entitled “Rethinking Adversarial Attacks Against Speech Recognition Systems.” Currently, speech recognition systems can be unreliable, as their machine learning foundations are susceptible to corruption when they encounter a sonic flaw. These “Adversarial Examples” mislead systems and lead to mistakes.

In his research, Dr. Zhang focuses on AI security, investigating risks in AI systems—like those that power the speech recognition systems behind the Amazon Echo, Google Assistant, and Home, Apple’s Siri, and Microsoft Cortana—and designing defense solutions to mitigate those risks.

While speech recognition models are built to mimic human hearing, there are elements of human hearing yet to be integrated, having to do with frequencies. The hypothesis of this research effort is that foments, or phonetic frequencies, hold the key to interpreting phenomes, the perceptible sounds humans use to break up words.

“We are trying to identify the critical features in the generated adversarial example that cause the speech recognition systems [to] misrecognize,” Dr. Zhang says, explaining that these features, which are imperceptible to humans, are the root cause of malfunctions. This creates a vulnerability that can be exploited by bad actors.

“Attackers can craft a clip of perturbation, inject it into a piece of soft music,” he explains, “[and] when being played using a speaker, [it] will be recognized by speech recognition systems as a phrase or a command to operate on, [like to] turn on the light. But to humans, it is still interpreted as soft music, with some kind of noise.”

According to Dr. Zhang, the integrated nature of problem-solving, as exemplified by this project, is a common characteristic of computer science research.

“Actually, speech recognition itself is an interdisciplinary subfield of computer science and computational linguistics,” Dr. Zhang says.

You can listen to demonstrations of Dr. Zhang’s research at this website.