2011 Seminars
January 2011
Seminar with Dr. Rich Stern | January 11th |
Dr. Rich Stern
Title: The impact of the distribution of internal delays in binaural models on predictions for psychoacoustical data
Abstract: Jeffress’s theory of binaural interaction has played a dominant role in the modeling of binaural phenomena since its introduction more than 60 years ago. Central to these models is a network of binaural coincidence-counting units that respond to simultaneous excitation from each of the two ears after a fixed internal delay. The predictions of these models for a broad range of binaural phenomena, including subjective lateral position, interaural discrimination, binaural detection, and dichotic pitch, are critically dependent on the shape of the function that describes the distribution of the binaural units with respect to their internal delays and best frequencies. In recent years new theories have challenged the traditional notion that the distribution function that best describes the widest variety of perceptual data is a broadly tapered function that decreases monotonically with increasing internal delay. Most arguments in support of the new theories have been based on physiological evidence or on the results of a limited set of psychoacoustical experiments. In this talk we will review the considerations that have motivated the traditional assumptions about the distributions of the internal delays, and we will critically evaluate the extent to which the new theories as well as the traditional binaural models are able to describe and predict experimental results for the broadest set of binaural perceptual phenomena, independently of physiological considerations.
Bio: Richard M. Stern received the S.B. degree from the Massachusetts Institute of Technology in 1970, the M.S. from the University of California, Berkeley, in 1972, and the Ph.D. from MIT in 1977, all in electrical engineering. He has been on the faculty of Carnegie Mellon University since 1977, where he is currently a Professor in the Electrical and Computer Engineering, Computer Science, and Biomedical Engineering Departments, the Language Technologies Institute, and a Lecturer in the School of Music. Dr. Stern’s initial research had been in the area of auditory perception, where he is best known for theoretical work in binaural perception. Much of Dr. Stern’s current research is in spoken language systems, where he is particularly concerned with the development of both physiologically-motivated and statistically-based techniques with which automatic speech recognition can be made more robust with respect to changes in environment and acoustical ambience. Dr. Stern is a Fellow of the Acoustical Society of America, the 2008-2009 Distinguished Lecturer of the International Speech Communication Association, a recipient of the Allen Newell Award for Research Excellence in 1992, and he served as General Chair of Interspeech 2006. He is also a member of the IEEE and the Audio Engineering Society.
Seminar with Dr. Yi Zhou | January 14th |
Dr. Yi Zhou, Johns Hopkins University
Title: Spatial and spectral processing in the primary auditory cortex of awake primate
Abstract: A realistic listening environment often consists of multiple sounds from various spatial locations. Although the auditory cortex has been shown to play important roles in sound localization, how cortical neurons process mixed spatial and spectral information of multiple sounds is not entirely clear. In this talk, I will discuss results on spatial and frequency selectivity of individual neurons in the primary auditory cortex (A1) of awake marmoset monkeys. I will show that the spatial receptive fields (SRFs) of A1 neurons measured in the awake condition typically contained a restricted excitatory region that was surrounded by inhibition. Interestingly, the frequency selectivity of A1 neurons was found to be largely location-invariant. These findings indicate that A1 neurons receive widespread spatial inputs that are composed of both excitation and inhibition and that the balance between excitatory and inhibitory inputs results in spatial selectivity of A1 neurons without altering their frequency selectivity. This joint spatial and spectral processing scheme may underlie A1 neurons’ ability to encode spatial-spectral-temporal relationships of multiple sounds.
HRC Seminar with Dr. Lee Miller | January 28th |
Dr. Lee Miller; University of California, Davis
Title: Neural bases of speech perception in noise: Integrating what we hear, see, and know
February 2011
Seminar with Dr. Olaf Strelcyk | February 4th |
Olaf Strelcyk, PhD; Starkey Hearing Research Center
Title: Peripheral auditory processing and speech reception in impaired hearing
Abstract: One of the most common complaints of people with impaired hearing concerns their difficulty with understanding speech. Particularly in the presence of background noise, hearing-impaired people often encounter great difficulties with speech communication. In most cases, the problem persists even if reduced audibility has been compensated for by hearing aids. It has been hypothesized that part of the difficulty arises from changes in the perception of sounds that are well above hearing threshold, such as reduced frequency selectivity and deficits in the processing of temporal fine structure (TFS) at the output of the cochlear filters. The purpose of the studies presented in this talk was to investigate these aspects in detail. A first study examined relations between frequency selectivity, TFS processing, and speech reception in listeners with normal and impaired hearing, using behavioral listening experiments. While a correlation was observed between monaural and binaural TFS-processing deficits in the hearing-impaired listeners, no relation was found between TFS processing and frequency selectivity. TFS processing was correlated with speech reception in background noise. Two further studies investigated cochlear response time (CRT) as an important aspect of the cochlear response to incoming sounds, using objective and behavioral methods. Alterations in CRT were observed for hearing-impaired listeners. A good correspondence between objective and behavioral estimates of CRT indicated that a behavioral lateralization method may be useful for studying spatiotemporal aspects of the cochlear response in human listeners. Behaviorally estimated filter bandwidths accounted for the observed alterations of CRTs in the hearing-impaired listeners, i.e., CRT was found to be inversely related to individual filter bandwidth. Overall, this work provides insights into factors affecting auditory processing in listeners with impaired hearing and may have implications for future models of impaired auditory signal processing as well as advanced compensation strategies.
ARO Practice Session | February 18th |
March 2011
Speech Signal Processing Guest Lecture – Peter Kroon | March 24th |
Peter Kroon, Intel Mobile Communications
Title: Speech and Audio Processing in Mobile Phones
Abstract: Modern cell phones have become widely accepted across the world and are small wonders of media signal processing. Voice, audio, image, video and graphics are processed using techniques based on years of media signal processing research. Although the visual media have dominated the appeal to newer generations of phones, it is still the voice and audio processing that continues to be challenging. This talk will review some of the speech and audio coding and processing techniques that are commonly found in cell phones. We also highlight some interesting accomplishments, and describe some of the challenges that we will find ahead.
Bio: Peter Kroon (Fellow’96) received the M.S. and Ph.D. degrees in electrical engineering from Delft University of Technology, Delft, The Netherlands. The regular-pulse excitation speech coding technique described in his PhD thesis forms the basis of the GSM full rate coder. In 1986 he joined Bell Laboratories, Murray Hill, NJ, where he has worked on a variety of speech coding applications, including the design and development of the 4.8 kbit/s secure voice standard FS1016 and the ITU-T 8 kbit/s speech coding standard G.729. From 1996 till 2000 he supervised a research group at Bell Labs, Lucent Technologies working in the areas of speech and audio coding design and communications technology. In 2000 Dr. Kroon became director of Media Signal Processing Research at Agere Systems, a spin off from Lucent Technologies, where he was responsible for research and development of media processing for satellite radio, VoIP and cellular terminals. In 2003 he moved to the Mobility business unit of Agere, where he was chief multimedia architect and manager of the multimedia systems group responsible for algorithmic design, and software and hardware integration of multimedia components for cellular phones. In October 2007, this BU was acquired by Infineon Technologies, and subsequently in Jan 2011 acquired by Intel. As part of Intel Mobile Communciations Dr. Kroon is a Senior Director, Multimedia Concept Engineering, and continues to drive development and deployment of multimedia solutions for cell phones. Dr. Kroon received the 1989 IEEE SP Award for authors less than 30 years old, for his paper on Regular Pulse Coding. He is an IEEE fellow, and served as Member of IEEE Speech Committee (1994-1996), General Chair, IEEE Speech Coding Workshop 1997, Associate Editor of IEEE Transactions on Speech and Audio processing (1997 -2000), Member at Large, IEEE Signal Processing Society Boards of Governors (2001-2003), and as Guest Editor of the IEEE Transactions on Audio, Speech, and Language Processing Special Issue on Objective Quality Assessment of Speech and Audio (2006). Dr. Kroon has published more than 50 papers, and holds 15 US patents.
April 2011
Speech Signal Processing Guest Lecture – Juergen Schroeter | April 17th |
Dr. Juergen Schroeter, AT&T Labs – Research
Title: The Evolution of Text-to-Speech Synthesis
Abstract: Text-to-Speech (TTS) is the technology that allows machines to talk to humans, delivering information through synthetic speech. Although people can simply listen to the output to judge its quality, creating a “good” TTS system is still a difficult problem. Speech Synthesis has come a long way since the first (albeit mechanical) “speaking machine” had been built by Wolfgang von Kempelen in 1791. Electronics entered the picture with Homer Dudley’s “Voder” in 1939. What is common to both speaking machines is that they needed a human operator directly manipulating the artificial vocal tract. In contrast, it needed the invention of the computer to automate the “text-to” part first (with the speaking “to-speech” part of the talking machine still in hardware), and then also to automate the speech synthesis itself, in effect, arriving at a whole TTS system done in software. Early pioneers of the 1960s driving the progress in TTS were people like Holmes, Klatt, Mattingly, and many others. Clearly, today research in TTS is still a multi-disciplinary field: from acoustic phonetics (speech production and perception to higher-level phonetics/phonology), over morphology (pronunciation), and syntax (parts of speech, POS; grammar), to speech signal processing (synthesis); it takes a team of experts in several fields to create a “good” TTS system in a particular language or accent. Conversely, there are several processing stages in a TTS system: the text front-end analyzes and normalizes the incoming text, creates and disambiguates possible pronunciations for each word in context, and generates the prosody (melody) of the sentence to be spoken. Methods for synthesizing speech in the back-end encompass articulatory synthesis where mathematical models of the articulators (lips, tongue, glottis) are used, over “black box” models of the vocal and nasal tracts such as formant synthesis and LPC, and, most recently, using “HMM” technology known from Speech Recognition for synthesis. However, data-driven approaches such as concatenative synthesis, and its latest variant, unit-selection synthesis, have all but won the race towards the most natural-sounding synthesis method. Evaluation of TTS systems is a field of growing importance. Clearly, TTS systems need to be evaluated with a specific application in mind. In any case, three different aspects need to be evaluated: accuracy (does the frontend transcribe input text like a human would read it?), intelligibility (do people easily understand the message?), and naturalness (does it sound like a recording of an actual human speaker?) The talk will emphasize demonstrations and examples for each of the processing steps in a TTS system. It will conclude by extrapolating current work in order to predict the future of TTS: perfect synthetic speech all the time.
Bio: As Executive Director in AT&T Labs – Research, Juergen Schroeter is leading teams that create technologies supporting Speech Recognition (AT&T WATSON) and Speech Synthesis (AT&T Natural Voices). From 1985 to 1996, he was a Member of Technical Staff at AT&T Bell Laboratories in Murray Hill, NJ, where he worked on speech coding and speech synthesis methods that employ computational models of the vocal tract and vocal chords. At AT&T’s trivestiture in 1996, he moved to AT&T Labs – Research. From 1976 to 1985, he was with the Institute for Communication Acoustics, Ruhr-University Bochum, Germany, where he did research in the areas of hearing and acoustic signal processing. Dr. Schroeter holds a Ph.D. in Electrical Engineering from Ruhr-University Bochum, Germany. In 2001, he received the AT&T Science and Technology Medal. He is a Fellow of the IEEE and a Fellow of the Acoustical Society of America.
HRC Seminar with Dr. Lori Holt | April 8th |
Dr. Lori Holt, Carnegie-Mellon University
Title: Speech, melodies and invaders from space: the formation tuning of auditory categories
Abstract: A rich history of research informs us about the ways experience with the native language shapes speech perception over both the short-term and the long-term. However, in part because of difficulty in meaningfully controlling and manipulating speech experience, we know very little about the mechanisms that are responsible. I will describe the results of a series of studies that carefully control experience with artificial sounds that mimic some of the complexities of speech. These studies suggest that progress in understanding speech processing can be made by understanding the boundaries and constraints of auditory perception and learning, in general. Reciprocally, our understanding of auditory processing is deepened by studying the complex, experience-dependent perceptual challenges presented by speech. Long relegated as a special system that could tell us little about general human cognition, the study of speech perception as a flexible, experience-dependent perceptual skill has much to offer the development of a mature auditory cognitive neuroscience.
Ph.D. Thesis Defense – Ross Maddox | April 13th |
Presenter: Ross Maddox
Title: Using and ignoring acoustic feature differences in auditory object recognition
Abstract: Object recognition is crucially important to understanding complex auditory scenes. We do it effortlessly, yet it is dependent not only on extracting a large number of acoustic features, but knowing which ones to use when. In certain settings, auditory objects must be recognized despite variations in some features, a phenomenon known as perceptual invariance. In other situations, objects are selected from competing sound sources by exploiting separations in acoustic features. For this dissertation, we used a combination of behavioral experiments and neural recordings to measure the effects of altering acoustic features on object recognition. In the first set of experiments, we time-stretched and compressed natural stimuli and presented them to zebra finches. Behavioral results with trained birds showed that song identity could be determined despite these perturbations. Electrophysiological recordings in zebra finch field L (the avian homolog of mammalian primary auditory cortex) revealed that, while time-warp invariance does not exist per se, responses to time-warped stimuli still contained more than sufficient information to identify target songs. In the next set of experiments, we presented human listeners with competing spoken digits that had specific locations and pitches. We instructed subjects to use one feature and ignore the other to report the digits that matched a primer phrase. Performance improved when the separation of the target and masker in the task-relevant feature increased, but the continuity of the task-irrelevant feature also influenced performance in some cases. These results indicate that task-relevant and task-irrelevant features are perceptually bound together, consistent with the idea that auditory attention operates on objects. In the final set of experiments, we investigated how spatial location affects responses in field L. We found that the location of birdsong presented in quiet had little effect on responses; however, when we presented the songs with a masker, spatial effects were dramatic, with neural performance much higher at some spatial configurations than others. The locations of both the target and masker sounds were important, and spatial response patterns were diverse among neurons.
HRC Seminar with Dr. Heidi Nakajima | April 15th |
Dr. Heidi Nakajima
Title: Intracochlear pressure measurements in human temporal bones and non-invasive diagnostic methods for patients with conductive hearing loss
Abstract: Simultaneous measurement of basal intracochlear pressures in scala vestibuli and scala tympani in human cadaveric temporal bones enables determination of the differential pressure across the cochlear partition, the stimulus that excites the partition, providing a measure of what a live ear would hear. We use intracochlear pressure measurements to understand disease mechanisms such as superior-canal dehiscence and various treatments such as active mechanical prosthetic devices. I will also talk about our recent study of non-invasive diagnostic methods for differentiating various pathologies that can result in conductive hearing loss in the presence of an intact, healthy tympanic membrane and an aerated middle ear. In contemporary clinical practice, there are no diagnostic tests that can reliably differentiate pathologies responsible for conductive hearing loss. An accurate test would spare unnecessary surgical exploration and would help an otologist in preoperative counseling of patients regarding surgical risks and in preoperative planning. We are investigating the clinical utility of ear-canal reflectance, and comparing it to umbo velocity measurements, our laboratory gold standard.
Speech Signal Processing Guest Lecture | April 19th |
Dr. Jim Glass, MIT Computer Science and Artificial Intelligence Laboratory
Title: How To Wreck a Nice Beach: Supervised and Unsupervised Methods
Abstract: The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, along with annotated training corpora. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer remains relatively static thereafter. While this approach has been effective for problems when there is adequate human expertise and labeled corpora, it is challenged by less-supervised or unsupervised scenarios. It also stands in stark contrast to human processing of speech and language where learning is an intrinsic capability. From a machine learning perspective, a complementary alternative is to discover unit inventories in a less supervised manner by exploiting the structure of repeating acoustic patterns within the speech signal. In this talk I first describe the fundamental components of a modern speech recognizer and describe the current state-of-the-art results on a variety of research tasks. I then describe a completely unsupervised pattern discovery method to automatically acquire lexical-like entities directly from an untranscribed audio stream. This approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which is used to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, it is possible to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multi-word phrases. Clusters found by applying this technique on a corpus of academic lectures exhibit high purity; many of the corresponding lexical identities are relevant to the underlying audio stream.
Bio: Jim Glass is currently a Principal Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory where he heads the Spoken Language Systems Group. He is also a Lecturer in the Harvard-MIT Division of Health Sciences and Technology. He obtained his S.M. and Ph.D. degrees in Electrical Engineering and Computer Science from MIT. His primary research interests are in the area of speech communication and human-computer interaction, centered on automatic speech recognition and spoken language understanding.
HRC Seminar with Dr. Mounya Elhilali | April 22nd |
Dr. Mounya Elhilali, Department of Electrical and Computer Engineering, Johns Hopkins University
Title: Speech analysis by brains and machines: Implications of coding strategies in auditory cortex
Abstract: The auditory system is tasked with representing complex sounds such as speech in such a way that we can separate competing sources, recognize various acoustic classes, and, in general, process complex auditory scenes. However, the principles governing how the auditory system -particularly at the level of auditory cortex- maps useful information from the environment into stable perceptual experiences remain unclear. A general proposal is that the system is constrained by ecological priors to shape the structure of neural computation. In this talk, I will discuss how neural responses over time can capture the statistics of speech sounds and how they influence the characteristics of a computational model of auditory cortical spectro-temporal receptive fields (STRFs). By constraining the ensemble output to be coherent in time, the resulting model receptive fields shed light on connections between the seemingly contradictory principles of sustained firing and sparseness in auditory sensory representations. Moreover, these findings suggest coding strategies that could mediate robust neural representations of speech signals in presence of unknown distortions by capturing the most informative statistics of speech. This framework puts constraints on the mapping from sound to internal representation along spectral and temporal modulation features with most important phonological information. Inspired by these results, I will further discuss the relevance of such spectro-temporal modulation constraints in the recognition of speech sounds by both humans and machines. This analysis contrasts the predictability of speech recognition by human listeners based on overlap between speech modulation profiles and noise. It also provides a natural scheme to impart cortical principles into automatic speech recognition (ASR) systems. I will discuss a simple analysis based on modulation constraints for automatic phoneme recognition that outperforms several state-of-the-art ASR systems in non-stationary noise backgrounds.
HRC Seminar with Dr. Alec Salt | April 22nd |
Dr. Alec Salt, Department of Otolaryngology Washington University School of Medicine
Title: Can Wind Turbines be bad for you?
HRC Seminar with Dr. Frank Guenther | April 29th |
Professor Frank Guenther, Department of Speech, Language, and Hearing Sciences, Sargent College of Health and Rehabilitation Sciences, Department of Cognitive and Neural Systems, Boston University
Title: The neural mechanisms of speech: From computational modeling to neural prosthesis
Abstract: Speech production is a highly complex sensorimotor task involving tightly coordinated processing in the frontal, temporal, and parietal lobes of the cerebral cortex. To better understand these processes, our laboratory has designed, experimentally tested, and iteratively refined a neural network model whose components correspond to the brain regions involved in speech. Babbling and imitation phases are used to train neural mappings between phonological, articulatory, auditory, and somatosensory representations. After learning, the model can produce syllables and words it has learned by generating movements of an articulatory synthesizer. Because the model’s components correspond to neural populations and are given precise anatomical locations, activity in the model’s cells can be compared directly to neuroimaging data. Computer simulations of the model account for a wide range of experimental findings, including data on acquisition of speaking skills, articulatory kinematics, and brain activity during speech. Furthermore, “damaged” versions of the model are being used to investigate several communication disorders, including stuttering, apraxia of speech, and spasmodic dysphonia. Finally, the model was used to guide development of a neural prosthesis aimed at restoring speech output to a profoundly paralyzed individual with an electrode permanently implanted in his speech motor cortex. The volunteer maintained a 70% hit rate after 5-10 practice attempts of each vowel in a vowel production task, supporting the feasibility of brain-machine interfaces with the potential to restore conversational speech abilities to the profoundly paralyzed.
May 2011
HRC Seminar with Dr. Douglas L. Oliver | May 13th |
Dr. Douglas Oliver, Department of Neuroscience, University of Connecticut Health Center
Title: New Concepts for Neural Circuits in the Inferior Colliculus: Not Just a Simple Relay in the Auditory Pathway
Abstract: The inferior colliculus, the primary midbrain structure in the auditory pathway, differs from most brain regions. It has parallel excitatory and inhibitory inputs and outputs. Subsets of inputs of both types drive neurons in the colliculus with different functions that depend on the source of the inputs. These neurons then transmit this information to the cortex via the thalamus. Recent discoveries about the neurons of the inferior colliculus suggest information is transformed during this process beyond the production of parallel excitatory and inhibitory outputs. Consequently, the signals going to the cortex from the midbrain are not a simple relay of information from the lower auditory pathway.
June 2011
HRC Seminar with Kyle Nakamoto | June 3rd |
Kyle Nakamoto, Post-Doc, Northeastern Ohio Universities Colleges of Medicine and Pharmacy (NEOUCOM)
Title: “Projections and effects of the auditory cortex on the ipsilateral and contralateral inferior colliculus”
HRC Seminar with Dr. Matt Goupell | June 17th |
Matt Goupell, Ph.D., Binaural Hearing and Speech Lab, University of Wisconsin-Madison
Title: Interaural decorrelation detection and other binaural processing in cochlear-implant users
Abstract: A majority of studies trying to understand binaural sensitivity in cochlear-implant (CI) users have employed constant-amplitude stimuli at a single pitch-matched pair of electrodes. However, the level of control in such stimuli is substantially different than that available during multiple-electrode stimulation in a clinical processing scheme. We will discuss several factors that prevent control in bilateral CIs, including interaural tonotopic place mismatch, highly variable loudness growth curves across electrodes and ears, and electrode channel interactions. In particular, attempting to use stimuli that include amplitude modulations, which is just one small step towards more complex and realistic stimuli, may have substantial impacts on binaural perception in CIs. One simple type of binaural stimulus with amplitude modulations is a stimulus with interaural decorrelation. Such a stimulus is related to a listener’s ability to understand speech in noisy situations, which is particularly difficult for CI users. Basic psychophysical data on interaural decorrelation and static interaural difference sensitivity in bilateral CI users will be presented along with similar data from normal-hearing (NH) subjects listening to pulse-train vocoders. The NH vocoder data provide guidance in interpretation of the CI data and show that the CI data are roughly consistent, albeit worse than the NH vocoder data. Furthermore, we will discuss how the pulse-train vocoder can be used to tease apart covarying stimulus factors in interaurally decorrelated stimuli. The vocoder experiments ultimately provide data that cannot be accounted for by a normalized cross-correlation model, whereas they can be explained by a model that combines instantaneous interaural time and level differences.
HRC Seminar with Jasmine Grimsley | June 24th |
Jasmine Grimsley, Post-Doc, Northeastern Ohio Universites Colleges of Medicine and Pharmacy (NEOUCOM)
Title: Processing of Communication Calls in the Guinea Pig Auditory Cortex
July 2011
HRC Seminar with Carolina Abdala | July 14th |
Dr. Carolina Abdala, House Research Institute
Title: Probing Human Cochlear Function during Development: A Long and Winding Road
August 2011
Ph.D. Thesis Defense – Ben Perrone | August 23rd |
Title: Neural Selectivity in the Secondary Auditory Forebrain of the Zebra Finch
Committee members: Prof. Kamal Sen, (BU/BME, Advisor) Prof. H. Steven Colburn (BU/BME), Prof. Nancy Kopell,(BU/Mathematics) Prof. Jason Ritt (BU/BME) Prof. Dimitrije Stamenovic (BU/BME, Chair).
Abstract: Recall and recognition of previously heard sounds is a well known but largely unexplained ability of the auditory system of humans and animals that communicate with sounds. In species with unique, personally identifiable vocalizations, the ability to recognize and identify an individual by its vocalizations is behaviorally important. For zebra finches, previously published field studies have shown this behavioral ability (Zann, 1996) and laboratory studies have identified the auditory processing pathways. While the primary auditory cortex homologue known as field L shows specificity for species-specific sounds (Grace et al., 2003), it shows little or no selectivity between different conspecific (same species) songs, making downstream secondary cortical areas a likely candidate for further investigation. In this study, we have investigated higher-order effects in the secondary auditory forebrain and found evidence of more specialized selectivity. In the first set of experiments, we adapted surgical and implant design techniques to perform anesthetized extracellular neural recordings in the caudal mesopallium. We acquired a number of conspecific stimuli of varying familiarity to the individuals being studied, and found neurons that responded selectively to familiar songs. We further quantified this selectivity via metrics incorporating spike timing. We then performed identical experiments in field L, finding no such selectivity to familiarity. Further experiments also showed that while both areas were selective to songs of the natural direction over reversed-playback songs, the effect in CM was more pronounced. Seeing these results under anesthetized conditions, we sought to compare these effects to those seen in unanesthetized restrained animals. Effects of both stimulus familiarity and direction were even more pronounced in awake animals than in anesthetized ones, suggesting possible feedback effects from higher-order areas, which merit future study. Here for the first time in the zebra finch we have shown a neural correlate for stimulus familiarity.
September 2011
HRC Seminar with James Simmons | September 16th |
Dr. James Simmons, Department of Neuroscience, Brown University
Title: Auditory mechanisms of biosonar
Abstract: Echolocating bats have marshaled the resources of the auditory system to achieve very acute sensitivity to the delay of biosonar echoes. Adaptations to facilitate timing acuity are manifested at all levels of the auditory system—the cochlea, the cochlear nucleus, the inferior colliculus, the auditory cortex. Neuronal computations that underlie biosonar imaging are revealed by comparing psychoacoustic results with neuroanatomical and neurophysiological results. The most surprising conclusion is that mechanisms familiar as the basis for pitch perception seem to be co-opted for sonar.
HRC Seminar with Morwaread Mary Farbood | September 23rd |
Dr. Morwaread Mary Farbood, Music and Audio Research Lab, New York University
Title: Integrating Disparate Auditory Features: A Parametric, Temporal Model of Musical Tension
Abstract: Musical tension is a high-level concept that is difficult to formalize due to its subjective and multidimensional nature. Contributing factors to tension can range from low-level auditory features such as loudness to high-level hierarchical musical structures such as harmony. In this talk, a model is proposed that can predict tension given any number of disparate auditory features. The model is based on data from two experiments. The first is a web-based study that was designed to examine how individual musical parameters contribute directly to a listener’s overall perception of tension and how those parameters interact. The second study is an in-lab experiment in which listeners were asked to provide continuous responses to longer, more complex musical stimuli. Both studies take into account a number of musical parameters including harmony, pitch height, melodic expectation, dynamics, onset frequency, tempo, meter, and rhythmic regularity, and syncopation. Linear and nonlinear models are explored for predicting tension given analytical descriptions of various musical parameters. These models are tested on the continuous response data from Experiment 2 and shown to be insufficient. An alternate model is proposed that takes into account the dynamic, temporal aspects of listening. This model is based on the notion of a moving perceptual window in time, and the concept of trend salience. High correlation with empirical data indicates that this parametric, temporal model accurately predicts tension judgments for complex musical stimuli.
HRC Seminar with Psyche Loui | September 30th |
Dr. Psyche Loui, Department of Nueorology, Beth Israel Deaconess Medical Center
Title: Relating Perception to Action in the Musical Brain
Abstract: The ability to perceive, learn, and derive pleasure from music is ubiquitous in humans around the world. While the human brain demonstrates remarkable sensitivity to musical elements including pitch and melody, special populations lie at both ends of the musical spectrum in tasks such as pitch discrimination, categorization, and production. These populations offer a unique window into our understanding of the necessary cognitive and neural building blocks for music. Drawing on behavioral and neuroimaging evidence from tone-deafness, absolute pitch, and studies on music learning, I will show that musical ability depends on structural and functional connectivity between brain regions that are necessary for perception and those that are important for the controlling and sequencing of actions.
October 2011
HRC Seminar with Cara Stepp | October 7th |
Dr. Cara Stepp, Assistant Professor, Speech, Language and Hearing Sciences; Assistant Professor, Biomedical Engineering, Boston University
Title: Relative Fundamental Frequency as an Acoustic Correlate of Laryngeal Tension
Abstract: The human voice is vital to communication, and allows us to interact with others to express our thoughts, desires, and emotions. However, for the 3 – 9% of the U.S. population with a voice disorder, communication is impaired, causing them to suffer both economically and socially. Vocal hyperfunction is a common cause of and accompaniment to voice disorders characterized by excessive laryngeal and paralaryngeal tension. Vocal hyperfunction can respond to behavioral intervention, but successful treatment is dependent upon proper assessment. Although vocal hyperfunction accounts for 10 – 40 % of cases referred to multidisciplinary voice clinics, current assessment is hampered by the lack of objective measures for detecting its presence or severity. Our recent findings indicate promise for the use of the non-invasive acoustic measure of relative fundamental frequency (RFF) as such a measure. RFF is the normalized fundamental frequency of vocal cycles immediately before and after voiceless consonant production. We have shown RFF to (i) discriminate between individuals with and without voice disorders associated with vocal hyperfunction, (ii) normalize after successful voice therapy toward values seen in unimpaired speakers, and (iii) correlate with listener auditory-perception of vocal effort. Our findings support future large-scale clinical studies of this measure.
HRC Seminar with Christopher Bergevin | October 21st |
Christopher Bergevin, Columbia University
Title: Otoacoustic Emission Delays As a Probe To Measure Cochlear Tuning: Comparative Validations
Abstract: The ear is not only sensitive to sound, but selective as well: Tonotopic tuning of the inner ear provides a means to resolve incoming spectral information. Measurements of frequency selectivity have traditionally relied upon either subjective psychophysical or objective (but invasive) physiological approaches. Sounds emitted from a healthy ear, known as otoacoustic emissions (OAEs), have been proposed to both objectively and non-invasively estimate peripheral auditory tuning. Despite diverse inner-ear morphological variation across animals, OAEs are a universal feature and correlate well to an animal’s range of ‘active’ hearing. Recent studies focusing on emission delays in response to a single ‘stimulus frequency’ (SFOAEs), conducted systematically across species in a variety of classes (mammals, aves, reptiles, & amphibians), support predictions relating emissions and tuning. Longer SFOAE delays presumably reflect the sharper tuning associated with resonant build-up time of the underlying auditory filters. Differences in tuning estimated from OAEs appear generally congruous with known anatomical and functional considerations: Larger sensory organs (i.e., more ‘filters’) with smaller ranges of audition exhibit sharper tuning. Comparisons made both broadly (inter-class) and within phylogenetically-matched groups (intra-family) indicate that SFOAE delays in humans are longer than any other species so far examined, suggestive of exceptionally sharper tuning.
November 2011
HRC Seminar with Susan Voss | November 4th |
Dr. Susan Voss, Smith College
Title: Reflectance Measurements on Normal and Fluid-Filled Newborn Ears
Abstract: With the adoption of universal hearing screening, hearing loss is usually identified within the newborn period. However, there is no consensus about the reliability of tympanometry for detecting middle-ear fluid in neonates less than six months of age. This work tests the hypothesis that reflectance may be reliably and objectively used to detect middle-ear fluid in newborn babies. At the time of the state-mandated newborn hearing screening (using ABR), reflectance measurements were made on newborn ears (age 0-2 days) that referred on one side and passed on the other side. Follow-up measurements were made on the same ears two to four weeks later at the diagnostic evaluation of the baby’s hearing sensitivity. Ears that referred at the newborn screening and were found to have normal hearing thresholds at the follow-up evaluation were assumed to have been fluid filled at birth. This work (1) compares reflectance measurements for ears that refer and pass at the newborn screening and (2) assesses changes in reflectance over the first month of life in normal-hearing ears. The long-term goal is to determine whether reflectance measurements made during the first months of life can detect fluid in the middle ear.
HRC Seminar with Nathan Spencer | November 11th |
Nathan Spencer, Boston University
Title: Relating performance in speech intelligibility tasks in various spatial configurations to performance in basic binaural processing tasks
Abstract: Off-center speech sources give rise to interaural time differences and interaural level differences with respect to the two ears. Individuals might use these differences to resolve speech targets from interfering speech maskers in complex every day listening environments. There is a rich body of literature describing large individual differences measured within both the normal hearing and hearing impaired subpopulations with spatially separated speech sources. There is another rich body of literature describing large individual differences in measures of basic binaural sensitivity. In the current study, the same set of subjects were measured both with respect to speech intelligibility with spatially separated sources, and basic binaural sensitivity– to explore the extent to which the two sets of measures are correlated within the normal hearing, the hearing impaired and the broader group containing both normal hearing and hearing impaired individuals.
December 2011
HRC Seminar with Lisa Goodrich | December 2nd |
Dr. Lisa Goodrich; Associate Professor, Department of Neurobiology, Harvard Medical School
Title: Developing a sense of sound: how the ear is wired for hearing
Abstract: Spiral ganglion neurons faithfully communicate sound information from hair cells in the cochlea to target neurons in the central auditory system. Our goal is to understand how spiral ganglion neurons establish the precise circuits that underlie the sense of hearing. To tackle this problem, we have been documenting the cellular and molecular events that occur during auditory circuit assembly. Through a large scale microarray comparison of spiral and vestibular ganglion neurons, we produced a catalog of auditory-specific genes. Currently, we are focusing on GATA3, a transcription factor that is eventually expressed only by spiral ganglion neurons. Through Cre-lox technology, we found that GATA3 plays an unexpectedly complex role in spiral ganglion neuron development, acting not only to suppress vestibular-specific genes and activate auditory-specific genes, but also to regulate the overall timing of neuronal differentiation. Mutations in GATA3 ultimately result in highly disorganized wiring of the cochlea, highlighting the importance of tightly regulated gene expression for auditory circuit assembly.
HRC Seminar with Shihab Shamma | December 9th |
Shihab Shamma
Title: Role of coherence and rapid-plasticity in active perception of complex auditory scenes
Abstract: Humans and other animals can attend to one of multiple sounds, and follow it selectively over time. The neural underpinnings of this perceptual feat remain mysterious. Some studies have concluded that sounds are heard as separate streams when they activate well-separated populations of central auditory neurons, and that this process is largely pre-attentive. Here, we argue instead that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source. Furthermore, we postulate that only when attention is directed towards a particular feature (e.g., pitch) do all other temporally coherent features of that source (e.g., timbre and location) become bound together as a stream that is segregated from the incoherent features of other sources.
HRC Seminar with Andy Brughera | December 16th |
Andy Brughera