Overview

Research Interests:
Machine Learning, Speech Processing, Speaker Diarization, Signal Processing, Variational Inference, Self-supervised Learning, Graph based Clustering.
Thesis Work
Graph Clustering Approaches for Speaker Diarization
My research involves self-supervised and supervised clustering approaches for automatic speaker diarization in the context of conversational speech.
What is Speaker Diarization ?
It is the task of partitioning an audio containing multiple speakers into segments assigning labels to the corresponding speakers.
Different colours represent different speakers. Courtesy :Google Blog
  • Graph Neural Network based speaker diarization

    This work involve developing a supervised heirarchical clustering algorithm using graph neural networks for speaker diarization.

  • Self-supervised speaker diarization with path integral clustering

    This work involves learning representations using clustering based loss. The task is self-supervised because we learn the representations using the clustering output given by the clustering algorithm to make the representations more speaker discriminative. We explored graph structural path integral clustering to encode embedding space in the form of graph. Published in IEEE Transactions on Speech, Audio and Language Processing.

  • Self-Supervised Speaker Diarization

    The proposed approach is based on principles of self-supervised learning where the self-supervision is derived from the clustering algorithm. The representations are learnt using triplet based loss derived from clustering output from previous stage. The work is accepted in Interspeech 2020.

  • Third DIHARD speech diarization challenge

    Contributed in baseline system setup for the DIHARD-III challenge. It involves task to partition an audio into speaker segments, in challenging environment where the audio is corrupted with noise, music, babble etc. and contains short speaker turns. It has applications in rich-text transcription of meetings, clinical diagnosis etc. Participated in challenge and was among top 10 teams across globe. Our system involved combination of End-to-End diarization based on transformers for telephone conversation and graph based clustering for multi-speaker conversations.

  • Speaker Diarization using Posterior Scaled VB-HMM

    The project involves identifying different speakers present in different segment of a given audio recording from DIHARD dataset which has challenging scenarios including restaurants, clinical interviews, mother child conversations etc. using posterior scaled Variational Bayes - Hidden Markov Model. The work is published in Interspeech 2019.

  • Diarization for multi-speaker test conditions in SRE 2018 challenge

    SRE 2018 challenge involved test conditions with multiple speaker. We perform diarization to extract individual speaker segments to score against the enrollment. This work is published in ICASSP 2019.

  • Workshops and Conferences
    • Presented in ICASSP 2023, Greece
    • Presented in IISc EECS Symposium April,2022
    • Presented paper in ASRU 2021
    • Presented in IISc EECS Symposium May,2021
    • Presented in IEEE-IISc Shannon's Day talk series, April,2021
    • Presented in DIHARD-III challenge workshop 2020
    • Talk on Women in Research in PyConIndia 2020, Online
    • Winter School on Speech and Audio Processing (WiSSAP) 2020,IIT Mandi, India
    • Presented paper and poster in Interspeech 2019, Graz, Austria
    • Summer school on mathematics for data science 2019 organised by IFCAM and IISc
    • Winter School on Speech and Audio Processing (WiSSAP) 2019, Trivandrum, India
    • Interspeech 2018, Hyderabad, India
    • Brain Computation and Learning Workshop, 2018, Bangalore, India
    • International Conference on Signal Processing and Communications(SPCOM), 2018