Cortical and Computational Decoding of Speech
Principal Investigator: Oded Ghitza
We conduct a tightly integrated computational and experimental research program across three sites (BU, NYU, Columbia) to study spoken language recognition from the psychophysical, neurophysiological, and engineering perspectives. The program proceeds in four fronts: (1) Psychophysics (Ghitza, BU). We measure and model the results of human performance in tasks designed to gain a better understanding on the interplay between neuronal oscillators in different frequency bands, and between the oscillations and the speech syllabic structure; (2) Human Neuroimaging. We formulate the intra-relationship among theta, beta and gamma oscillations, using MEG (David Poeppel, NYU) and ECoG (Charles Schroeder, Columbia) data recorded while subjects perform intelligibility tasks; (3) Monkey Electrophysiology (Charles Schroeder, Columbia). If the emerging cortical computation principles are fundamental, they must generalize across mammalian species. We are using high-resolution physiological methods to measure the intra-relationship among oscillations using multi-electrode recordings in monkeys listening to stimuli specifically designed to capture the rhythmic aspects of natural speech and music; (4) Automatic Speech Recognition (Ghitza, BU). We explore a new perspective to the development of ASR systems that incorporates the insights from the behavioral and brain sciences, specifically rhythmic brain activity. We ascertain whether the proposed cortical computation principle could be used as an adjunct to conventional features used in ASR systems, e.g. in lattice re-scoring of n-best lists – and ultimately result in a decrease in word error rate.