Title: Machine Learning for
Adaptive Time-Frequency Analysis of Signal Mixtures
Presenter: Wenyang Zhang
Date: December 19, 2019
Time: 10:00AM to 12:00PM
Location: PHO 901
Advisor: S. Hamid Nawab, ECE
Committee: W. Clem Karl, ECE, Osama Alshaykh, ECE, H. Steven, Colburn, BME
In this dissertation, we present and evaluate a novel approach for incorporating machine learning into the time-frequency analysis of audio signals in the context of speaker-independent multi-speaker pitch tracking. The pitch tracking performance of the resulting algorithm is comparable to that of a state-of-the-art machine-learning algorithm for multi-pitch tracking while being significantly more computationally efficient and requiring much less training data.
Multi-pitch tracking is a time-frequency signal processing problem in which mutual interferences of the harmonics from different speakers make it challenging to design an algorithm to reliably estimate the fundamental frequency trajectories of the individual speakers. The current state-of-the-art in speaker-independent multi-pitch tracking involves 1) utilizing a deep neural network for producing spectrograms of individual speakers and 2) utilizing another deep neural network that acts upon the individual spectrograms and the original audio’s spectrogram to produce estimates of the pitch tracks of the individual speakers. However, the implementation of the resulting Multi-Spectrogram Machine-Learning (MS-ML) algorithm requires on the order of 10^10 multiplications per second as well as 10^10 additions per second.
Instead of utilizing deep neural networks to estimate the pitch values directly, we have derived and evaluated a fault recognition and diagnosis (FRD) framework that utilizes machine learning techniques to recognize potential faults in the pitch tracks produced by a traditional multi-pitch tracking algorithm. The result of this fault-recognition phase is then used to trigger a fault-diagnosis phase aimed at resolving the recognized fault(s) through adaptive adjustment of the time-frequency analysis of the input signal. The pitch estimates produced by the resulting FRD-ML algorithm are found to comparable in accuracy to those produced via the MS-ML algorithm. However, our evaluation of the FRD-ML algorithm shows it to have significant advantages over the MS-ML algorithm. Specifically, the number of multiplications per second in FRD-ML are found to be two orders of magnitude less while the number of additions per second is about the same as in the MS-ML algorithm. Furthermore, the required amount of training data to achieve optimal performance is found to be two orders of magnitude less for the FRD-ML algorithm in comparison to the MS-ML algorithm.