Sam Roweis, Ph.D.
Assistant Professor
Department of Computer Science
University of Toronto
MIT CSAIL
"A Segment-Based Probabilistic Generative Model of Speech"
Abstract:
Although processing speech signals directly in the time domain
has a bad reputation, a clean speech wave exhibits a lot of amazing
structure. I'll talk about a radical approach to speech processing
which operates purely in the time domain and breaks the speech into
"atomic units" by identifying waveform samples at the boundaries between
glottal pulse periods (in voiced speech) or at the boundaries of unvoiced
segments. An efficient algorithm for inferring these boundaries and
estimating the average spectra of voiced and unvoiced regions is derived
from
a simple probabilistic generative model. Results are presented on pitch
tracking, voiced/unvoiced detection and timescale modification; all these
tasks and several others can be performed using the single segmentation
provided by inference in the model.