| Home | Research | Coursework | Publications | Others | Contact me |

Research Interests

My research interests are primarily into machine learning for signal processing, i.e. incoporating machine learning aspects into speech signal processing and applications to extract/find/uncover some detailed/hidden information which may or may not be crucial for a task. Currently I am exploring generative models for unsupervised speech representation learning. The motivation is partly from auditory neuroscience studies. The applications are over speech recognition system, audio source separation.

Research Work

If we broadly divide the speech recognition system building into feature extraction and system training, my focus is on the feature extraction part, in particular, how to obtain robust speech representations. By robustness, we mean reduce the effect of (or being invariant to) environmental effects on speech, like different types of noises (sitting in restaurant with multiple people talking in the background, babble noise, car noise, traffic, etc, room reverberations). For this, I am exploring a specific method called modulation filtering.

Modulation filtering refers to the technique of enhancing key dynamics of the speech signal in the time-frequency domain. This is motivated from the past studies about how humans perceive the speech sounds and are able to hear, converse reasonably well in noisy or reverberant environments. Studies on human speech perception have highlighted the sensitivity of the auditory system to various modulation frequencies in the spectro-temporal domain.

In our research, we explore the derivations of key modulations purely from a data-driven perspective using deep generative models. Early research in this direction has shown that modulation filters derived from large amounts of speech data correspond well to the filters previously observed in perceptual studies and they also improve the speech recognition performance in noisy and reverberant conditions. In addition, they are resilient to semi-supervised training of ASR.