2012 Seminars

January 2012

HRC Seminar with Deryk S. Beal January 20th

Dr. Deryk S. Beal, Ph.D., CCC-SLP, Reg. CASLPO Speech-Language Pathologist C.I.H.R. Post Doctoral Research Fellow Center for Computational Neuroscience and  Neural Technologies (CompNet) - Speech Laboratory Boston University

Title: Structural and functional abnormalities of the auditory system in people who stutter

Abstract: Stuttering is a developmental disorder traditionally defined as a disturbance of speech fluency characterized by frequent and protracted involuntary speech sound repetitions and prolongations as well as silent blocks that impair the production of speech. The disorder has its onset in the preschool years and, although some cases resolve prior to puberty, persists across the lifespan. For the estimated 3 million Americans who stutter, the inability to produce fluent speech results in social isolation, victimization by bullies, reduced academic performance and limited opportunities for advancement in the workforce. Stuttering treatment is costly and, unfortunately, highly ineffective for the majority of people affected by the disorder. As the cause of, and a cure for, stuttering remain elusive, the United States’ National Institutes of Health has identified the disorder as a priority for research funding.
There is no evidence that people who stutter differ in outer or middle ear anatomy or physiology from people who do not stutter and investigations of the auditory evoked brainstem response have yielded inconsistent evidence of a deficit in the auditory pathway from cochlear nucleus to brainstem in people who stutter. However, cortical neuroimaging studies have identified a common pattern of increased motor signal and decreased auditory signal during a variety of speaking tasks in people who stutter relative to a control group. I will present data that I collected in order to investigate the potential involvement of speech-induced auditory suppression in the disorder. Results will be discussed within the context of cortical speech motor control theory and the potential for an auditory deficit at the core of developmental stuttering.

HRC Seminar with Andrew Oxenham January 27th

Dr. Andrew J. Oxenham, University of Minnesota

Title: Pitch, temporal fine structure, and speech perception in noise

Abstract: The question of how temporal fine structure (TFS) is coded in the peripheral auditory system, and its role in representing pitch in speech and music, has been of interest for several years. Recent studies have suggested that TFS may be critical to understanding speech, particularly in complex acoustic backgrounds, and that deficits in the temporal coding of TFS in both hearing-impaired listeners and cochlear-implant users may underlie some of the perceptual challenges they face. In this talk I review some recent psychoacoustic studies from our lab that address the issue of pitch coding of pure and complex tones, and present critical tests of the role of TFS in understanding speech in steady and fluctuating noise backgrounds.

February 2012

HRC Seminar with Lina Reiss February 3rd

Dr. Lina A.J. Reiss, Assistant Professor, Oregon Health & Science University

Title: Hybrid cochlear implants, speech perception, and pitch plasticity

Abstract: A recent advance in cochlear implants is the introduction of the Hybrid or electro-acoustic cochlear implant, comprised of a short version of a standard cochlear implant electrode array designed for preservation of residual low-frequency hearing, and thus combined acoustic and electric stimulation in the implanted ear.  I will review the latest findings in the Hybrid clinical trial, what the Hybrid concept has taught us about speech perception and pitch plasticity, and the general implications of these findings for perception with cochlear implants and auditory prostheses.

HRC Seminar with Oded Ghitza February 10th

Dr. Oded Ghitza

Title: Peak position coding of modulation spectrum: evidence for the function of brain rhythms in speech perception

Abstract: The premise of this study is that human speech decoding is governed by a cascade of neuronal oscillators that guide template-matching operations at a hierarchy of temporal scales. The oscillators are in the theta, beta and gamma frequency bands. It is argued that the theta oscillator is the master, capable of tracking the speech input rhythm.
The other oscillators entrain to theta . The hypothesis about the role of theta was tested by measuring intelligibility of speech with manipulated modulation spectrum. Each critical-band envelope was manipulated by: (i) stopband filtering (2­–9 Hz), or (ii) peak position coding (PPC) – the lowpass filtered envelope (up to 10 Hz) was substituted for a train of identical pulses located at the peaks of the smoothed envelope. Stopband speech was barely intelligible, while PPC speech was somewhat intelligible; adding the two markedly improved intelligibility. It is argued that stopband speech is barely intelligible because, with the nullification of the band information, the (cortical) theta oscillator is prevented from tracking the input rhythm, hence the disruption of the hierarchical temporal scale that guides the decoding process. Plugging the PPC speech reinstates this capability, resulting in the extraction of additional information from the stopband modulation spectrum.

HRC Seminar with Barbara Shinn-Cunningham February 17th

Dr. Barbara Shinn-Cunningham, Director, Center for Computational Neuroscience and Neural Technology; Professor, Biomedical Engineering

Title: Acoustic features and individual differences affecting auditory attention

Abstract: In real-world settings, the abilities to focus, maintain, and switch auditory attention are each critical for allowing us to communicate. However, we are still unraveling what acoustic features enable these feats of selective attention, let alone why some listeners are better than others in performing these feats. This talk will review results of a number of experiments in my lab looking at some of the dynamics of auditory attention, exploring some of the features important in focusing on and maintaining attention on a source in a sound mixture. Recent results that speak to why there may be large differences in individual ability will also be presented.

March 2012

HRC Seminar with Harry Levitt March 28th
*Please note special date and time!*

Wednesday March 28, 12:30 pm
ERB 203
Harry Levitt
Advanced Hearing Concepts

Auditory (Speechreading) Training As Entertainment: Read My Quips

Abstract: A sustained, intensive training effort is needed for improving speech communication by people with hearing loss. Unfortunately, auditory training programs are boring with the result that many trainees do not complete the program and consequently do not acquire the improved listening skills that they are capable of achieving.  An auditory training program has been developed which is designed to be entertaining so as to maintain the trainee's motivation over time. The focus of the training program is to improve auditory-visual speech recognition in noise for people with hearing loss. Improvements in speech recognition were obtained that were equivalent to an increase in speech-to-noise ratio 2 to 3 dB.


Biosketch: Harry Levitt has a PhD in Electrical Engineering from the Imperial College of Science and Technology, London, England. On graduating in 1964, he joined Bell Laboratories in Murray Hill, New Jersey, where he did research on binaural hearing, speech analysis, synthesis and perception, computer-assisted adaptive testing and digital signal processing. In 1969, he joined the City University of New York where he continued his research along these lines. An area of particular interest was that of applying advances in digital technology to help people with hearing loss. During this period he worked on the development of digital hearing aids, methods of digital signal processing to improve speech understanding in noise and on computer-based methods for auditory rehabilitation. After retiring from the City University of New York in 2000, he founded Advanced Hearing Concepts, a research company dedicated to the application of digital technology to help people with hearing loss.
HRC Seminar with Birger Kollmeier March 30th

Dr. Birger Kollmeier

Title:Processing limits in cocktail parties: Modelling sensory and cognitive aspects of speech intelligibility in complex listening environments for normal and hearing-impaired listeners


April 2012

HRC Seminar with Stefan Launer April 2nd
*Please note special date and time!*

Monday, April 2, 10:30 am
ERB 203
Stefan Launer

Title: State of the Signal Processing in Hearing Instruments, especially also wireless connectivity

Abstract: TBA

HRC Seminar with Sarah Woolley April 6th

Dr. Sarah M.N. Woolley, Department of Psychology, Columbia University

Title: Transformations in the neural coding of communication vocalizations along the ascending auditory system


Processing communication signals is a crucial, natural function of the brain’s sensory systems. Sensory systems generate neural representations of external stimuli that lead to perception and guide behavior. An important part of understanding how sensory processing leads to perception is explaining the mechanisms whereby neural representations of complex sensory signals such as communication sounds transform along sensory coding pathways. The auditory system is unique with regard to the large number of distinct processing stations involved in the coding of sensory stimuli. We study the hierarchical auditory coding of complex vocal communication sounds in the songbird.


I will present our work on the song coding properties of single neurons in the midbrain, primary forebrain and higher forebrain. We have examined coding of individual songs presented in auditory scenes and alone. We observe subtle but consistent differences in the dense and non-selective coding of song between the midbrain and primary forebrain. In contrast, we observe a transition from a dense coding scheme to a sparse and highly selective coding scheme between the primary and higher forebrain. We present a model for the transformation of song coding from dense and redundant to sparse and selective, and the consequences of dense versus sparse codes for the neural extraction of a target signal from a complex auditory scene.

HRC Seminar with Ruth Litovsky April 13th

Dr. Ruth Litovsky

Title: Learning to hear with bilateral cochlear implants: Effect of degraded signals on spatial hearing and auditory development


Cochlear implants (CIs) are being provided at an increasing rate, in particular to young children. While many bilateral CI users attain spoken language skills that are well within the range of performance seen in normal-hearing (NH) peers, CI users generally perform significantly worse NH children on tasks that involve functioning in realistic, complex listening environments. Namely, they are worse at speech understanding in noise, segregation of target from masking sounds and sound localization. This is despite the overwhelming positive reports regarding improvement in quality of life with two vs. one CIs. Our lab investigates binaural processing in bilateral CI users, with the goal of uncovering mechanisms that enhance performance, and factors that limit performance. This talk will focus on recent findings in (1) young children who are bilaterally implanted by age 1-3 years, (2) children who received their devices at an older age, and (3) adults who were either pre-linugally or post-lingually deaf prior to being implanted. Some of the factors we are taking into account include: CIs were not designed to provide binaural stimulation and are not synchronized across the to ears, CIs in two ears may be surgically mis-matched by depth and therefore provide mis-matched information to the two sets of channels, the two ears are likely to have mis-matched neural survival, the CI processors and microphones do not preserve spatial cues with fidelity. Our research aims to understand what limitations exist in today’s clinical processors and how we can restore binaural cues to CI users with fidelity using unique signal processing and research platforms. Towards this goal our behavioral studies on spatial hearing provide insight into mechanisms involved in auditory plasticity.

Work supported by grants from the NIH-NIDCD

Speech Signal Processing Guest Lecture - Juergen Schroeter April 19th

Speech Signal Processing Guest Lecture. All are welcome! *Please note special date & time*


Who:      Juergen Schroeter, AT&T Labs – Research

When:    2:00pm, Thursday April 19, 2012

Where:   Room 203 Engineering Research Building (ERB)

44 Cummington St

Boston, MA


Title: The Evolution of Text-to-Speech Synthesis


Text-to-Speech (TTS) is the technology that allows machines to talk to humans, delivering information through synthetic speech.  Although people can simply listen to the output to judge its quality, creating a “good” TTS system is still a difficult problem.

Speech Synthesis has come a long way since the first (albeit mechanical) “speaking machine” had been built by Wolfgang von Kempelen in 1791.  Electronics entered the picture with Homer Dudley’s “Voder” in 1939.  What is common to both speaking machines is that they needed a human operator directly manipulating the artificial vocal tract.  In contrast, it needed the invention of the computer to automate the “text-to” part first (with the speaking “to-speech” part of the talking machine still in hardware), and then also to automate the speech synthesis itself, in effect, arriving at a whole TTS system done in software.  Early pioneers of the 1960s driving the progress in TTS were people like Holmes, Klatt, Mattingly, and many others.

Clearly, today research in TTS is still a multi-disciplinary field: from acoustic phonetics (speech production and perception to higher-level phonetics/phonology), over morphology (pronunciation), and syntax (parts of speech, POS; grammar), to speech signal processing (synthesis); it takes a team of experts in several fields to create a “good” TTS system in a particular language or accent.

Conversely, there are several processing stages in a TTS system: the text front-end analyzes and normalizes the incoming text, creates and disambiguates possible pronunciations for each word in context, and generates the prosody (melody) of the sentence to be spoken.

Methods for synthesizing speech in the back-end encompass articulatory synthesis where mathematical models of the articulators (lips, tongue, glottis) are used, over “black box” models of the vocal and nasal tracts such as formant synthesis and LPC, and, most recently, using “HMM” technology known from Speech Recognition for synthesis.  However, data-driven approaches such as concatenative synthesis, and its latest variant, unit-selection synthesis, have all but won the race towards the most natural-sounding synthesis method.

Evaluation of TTS systems is a field of growing importance.  Clearly, TTS systems need to be evaluated with a specific application in mind.  In any case, three different aspects need to be evaluated: accuracy (does the front-end transcribe input text like a human would read it?), intelligibility (do people easily understand the message?), and naturalness (does it sound like a recording of an actual human speaker?)

The talk will emphasize demonstrations and examples for each of the processing steps in a TTS system.  It will conclude by extrapolating current work in order to predict the future of TTS: perfect synthetic speech all the time.

Biosketch: As Executive Director in AT&T Labs - Research, Juergen Schroeter is leading teams that create technologies supporting Speech Recognition (AT&T WATSON) and Speech Synthesis (AT&T Natural Voices). From 1985 to 1996, he was a Member of Technical Staff at AT&T Bell Laboratories in Murray Hill, NJ, where he worked on speech coding and speech synthesis methods that employ computational models of the vocal tract and vocal chords.  At AT&T's trivestiture in 1996, he moved to AT&T Labs - Research. From 1976 to 1985, he was with the Institute for Communication Acoustics, Ruhr-University Bochum, Germany, where he did research in the areas of hearing and acoustic signal processing.

Dr. Schroeter holds a Ph.D. in Electrical Engineering from Ruhr-University Bochum, Germany.  In 2001, he received the AT&T Science and Technology Medal.  He is a Fellow of the IEEE and a Fellow of the Acoustical Society of America.

HRC Seminar with Arthur N. Popper April 20th

Dr. Arthur N. Popper, Department of Biology, University of Maryland

Title: TBA

Abstract: TBA

Speech Signal Processing Guest Lecture - Jim Glass April 24th

Speech Signal Processing Guest Lecture - All are welcome!

Jim Glass, MIT Computer Science and Artificial Intelligence Laboratory

3:00pm, Tuesday April 24, 2012

Room 203

Engineering Research Building (ERB)

44 Cummington St

Boston, MA

Abstract: The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, along with annotated training corpora.  Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer remains relatively static thereafter.  While this approach has been effective for problems when there is adequate human expertise and labeled corpora, it is challenged by less-supervised or unsupervised scenarios.  It also stands in stark contrast to human processing of speech and language where learning is an intrinsic capability.  From a machine learning perspective, a complementary alternative is to discover unit inventories in a less supervised manner by exploiting the structure of repeating acoustic patterns within the speech signal.  In this talk I first describe the fundamental components of a modern speech recognizer and describe the current state-of-the-art results on a variety of research tasks.  I then describe a completely unsupervised pattern discovery method to automatically acquire lexical-like entities directly from an untranscribed audio stream. This approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which is used to find matching acoustic patterns between spoken utterances.  By aggregating information about these matching patterns across audio streams, it is possible to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multi-word phrases.  Clusters found by applying this technique on a corpus of academic lectures exhibit high purity; many of the corresponding lexical identities are relevant to the underlying audio stream.

Biosketch: Jim Glass is currently a Senior Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory where he heads the Spoken Language Systems Group.  He is also a Lecturer in the Harvard-MIT Division of Health Sciences and Technology.  He obtained his S.M. and Ph.D. degrees in Electrical Engineering and Computer Science from MIT.  His primary research interests are in the area of speech communication and human-computer interaction, centered on automatic speech recognition and spoken language understanding.

HRC Seminar with Stephen Lomber April 27th

Dr. Stephen G. Lomber, Professor, University of Western Ontario

Title: Acoustic Experience Alters How You See the World


When the brain is deprived of input from one sensory modality, it often compensates with supranormal performance in one or more of the intact sensory systems.  In the absence of acoustic input, it has been proposed that cross-modal reorganization of deaf auditory cortex may provide the neural substrate mediating compensatory visual function.  We tested this hypothesis by using a battery of visual psychophysical tasks and found that congenitally deaf, compared to hearing, cats have superior localization in the peripheral field and lower visual movement detection thresholds.  In the deaf cats, reversible deactivation of posterior auditory cortex selectively eliminated superior visual localization abilities while deactivation of the dorsal auditory cortex eliminated superior visual motion detection.  Our results demonstrate that enhanced visual performance in the deaf is caused by cross-modal reorganization of deaf auditory cortex and it is possible to localize individual visual functions within discrete portions of reorganized auditory cortex.

June 2012

HRC Seminar with Gin Best June 26th

Dr. Virginia Best, National Acoustic Laboratories, Sydney, Australia (NAL) and the HRC, Boston, MA

Title: Psychophysical tests in simulated real-world environments

** Please note, this seminar will be held at 44 Cummington St., Room 403 at 2:00 PM**

September 2012

HRC Seminar with Zach Smith September 7th

Dr. Zachary Smith, Principal Research Scientist, Cochlear

Title: Preserving Binaural Cues for Bilateral Cochlear Implants

HRC Seminar with Theo Goverts September 11th

*SPECIAL DATE AND TIME* September 11, 2012 at 10:30 AM in ERB 203

S. Theo Goverts, PhD, audiologist, University Audiological  Center (ENT), VU University Medical Center, AMSTERDAM, The Netherlands

Title: Clinically driven research on speech recognition

Abstract: In line with the WHO-ICF framework, we aim at optimal function and societal participation for listeners with impaired hearing, taking personal and contextual factors into account. In our clinic we focus on “the hearing impaired infant” and “the hearing impaired at work”. This focus is supported by research and development. This presentation will describe ongoing research on 1) the effect of congenital hearing loss on linguistic skills, 2) the role of linguistic skills in speech recognition, and 3) speech recognition in realistic conditions ( in collaboration with Steve Colburn). Essential to all studies is the use of psychophysics and the aim to seek relevance for everyday auditory function.

HRC Seminar with Frederick Gallun September 14th

Frederick J. Gallun, PhD Research Investigator, National Center for Rehabilitative Auditory Research Portland VA Medical Center and Oregon Health & Science University

Title: Impacts of aging, hearing loss, and traumatic brain injury on binaural and spatial hearing

Abstract: Aging, hearing loss, and traumatic brain injury due to combat-related exposure to high intensity blasts are all factors that can potentially impact the binaural and spatial hearing abilities of Veterans. This presentation will describe three ongoing studies each focused on determining 1) to what degree individual factors such as aging, hearing loss, and blast exposure are associated with poorer performance on monaural and/or binaural temporal tasks and 2) the degree to which these factors lead to dissociable patterns of impairment on tasks related to various aspects of spatial hearing. The answers to these questions have important implications for the methods by which spatial auditory function is assessed and the ways in which the effects of age, hearing loss, and traumatic brain injury are rehabilitated in both Veteran and non-Veteran patient populations.


HRC Seminar with Christoph Schreiner September 24th


Dr. Christoph E. Schreiner; Department of Otolaryngology & Department of Bioengineering and Therapeutic Sciences; University of California

Title: Emergence of multidimensional sound processing in the central auditory system


The main goal of computational neuroscience of the auditory system is to quantitatively describe information processing in auditory centers.  Successful descriptions will link, and eventually account for, the principles of auditory processing from synaptic, cellular, network, and behavioral perspectives. Since processing is highly nonlinear, it precludes a single approach that can account for all naturally occurring stimulus configurations.  By using parametrically characterized sounds with some natural sound features, multi-dimensional linear/nonlinear processing models may reveal significant characteristics of central auditory behavior. We will discuss these models within the framework of spectro-temporal receptive fields and how they may be used to reveal transformations in sound processing between the auditory midbrain, thalamus, and cortex. These spectro-temporal receptive field models reveal that the nature of interacting receptive field components and the prevalence of nonlinear stimulus combination responses changes along the lemniscal pathway: compared to the mesencephalon and diencephalon, cortical processing showed significant increases in processing complexity, STRF cooperativity, and nonlinearity. Potential implications of these processing changes for the identification of auditory cortical functions and tasks will be discussed.

HRC Seminar with Eric Young September 28th

Dr. Eric D. Young, Professor of Biomedical Engineering, Johns Hopkins University

Title: Robustness of response and convergence of information in the inferior colliculus

Abstract: The auditory system consolidates information about sound in the neurons of the inferior colliculus (IC) after substantial processing in brainstem nuclei. It is natural to wonder  what is different about the representation of sound in the IC compared to the auditory nerve.

Of course binaural interactions are a major function of brainstem nuclei, but there are also important differences having to do with the robustness of the representation, which is improved in IC. This talk will discuss the ways in which the IC representation is more robust, for example against changes in sound level, and also present recent results on the convergence of sound localization information in IC neurons.

October 2012

HRC Seminar with Dan Ellis October 5th

November 2012

HRC Seminar with Max Little November 2nd

Max Little, PhD, Wellcome Trust-MIT Postdoctoral Research Fellow

Title: Quantifying Movement Disorder Symptoms by Voice

Abstract: For many progressive movement disorders such as Parkinson’s, it would be valuable in practice to detect the symptoms of the disease remotely, noninvasively, and objectively. Voice impairment is one of the primary symptoms of Parkinson’s. In this talk I’ll describe techniques that can be used to detect Parkinson’s and quantify the symptoms, using voice recordings alone. These algorithms achieve 99% overall accuracy in detecting the disease, and around 2% error in replicating the clinical symptom severity score on the Unified Parkinson’s Disease Rating Scale. I’ll also describe early results from the Parkinson’s Voice Initiative, a project that has captured a very large sample of voices from healthy controls and Parkinson’s subjects from around the world, using the standard telephone network.

HRC Seminar with Antje Ihlefeld November 9th

Dr. Antje Ihlefeld, Post-Doctoral Scientist, New York University

Title: Bilateral cochlear implants: How and why restoration of spatial cues can improve speech intelligibility

Spatial release from masking refers to an important benefit for speech understanding. It can occur when a target talker and a masker talker are spatially separated. In cochlear implant listeners, spatial release from masking is much reduced or absent compared with normal hearing listeners. Perhaps this reduced spatial release occurs because cochlear implant listeners cannot effectively attend to spatial cues. In the first part of this talk I will describe a behavioral study with normal-hearing listeners. In that study, we simulated cochlear implant listening with a novel vocoding technique. Three experiments examined factors that may interfere with deploying spatial attention to a target talker masked by another talker. Results show that faithful long-term average interaural level differences were insufficient for producing spatial release from masking. This suggests that appropriate interaural time differences (ITDs) are necessary for restoring spatial release from masking, at least for a situation where there are few viable alternative segregation cues. One possibility for restoring ITDs in cochlear implant users is to impose ITDs on the envelope of amplitude-modulated high-rate carrier pulse trains. In the second part of this talk I will show results from a direct stimulation experiment that examined envelope ITD sensitivity in cochlear implant listeners.
HRC Seminar with Elizabeth Strickland November 16th

Elizabeth Strickland, Ph.D., Professor, Purdue University

Title: Behavioral and modeling explorations of cochlear gain reduction

Abstract:  The medial olivocochlear reflex (MOCR) has been shown physiologically to adjust cochlear gain in response to sound.  However, its function in normal hearing is not well understood.  In this presentation, I will present data from two lines of research that we have been using to explore the MOCR.  One is psychoacoustic research showing that preceding sound changes estimates of gain and frequency selectivity in a way that is consistent with the activity of the MOCR.  In particular I will focus on the time course of this effect.  The other is modeling of physiological data using an auditory nerve model in which the OHC gain may be adjusted to mimic the MOCR.  Both lines of research suggest that the MOCR could improve signal detection and discrimination in noise.

HRC Seminar with Daniel Polley November 30th

Dr. Daniel Polley, MEEI

Title: Development, plasticity, and repair of cortical sound representations

December 2012

HRC Seminar with Ken Hancock December 7th

Dr. Kenneth E. Hancock, Ph.D., Eaton-Peabody Laboratories/Massachusetts Eye & Ear Infirmary

Title: Neural ITD coding with bilateral cochlear implants

Abstract: TBA

HRC Seminar with Nathan Spencer December 14th

Nathan Spencer, Ph.D. candidate, Boston University

Title: Towards explaining individual differences measured in speech intelligibility tasks with spatially separated speech maskers in normal hearing and in hearing impaired listeners