LEAP LABORATORY

Acoustic and Semantic Modeling of Emotion in Spoken Language

Soumya Dutta

Indian Institute of Science, March. 2026.

PDF

Investigating Neural Mechanisms of Word Learning and Speech Perception

A.Soman

Indian Institute of Science, April. 2024.

PDF

Graph Clustering Approaches for Speaker Diarization of Conversational Speech

P.Singh

Indian Institute of Science, Feb. 2024.

PDF

Dereverberation of Speech Using Autoregressive Models of Sub-band Envelopes

Anurenjan P.R.

Indian Institute of Science, Sep. 2023.

PDF

Supervised Learning Approaches for Language and Speaker Recognition

S.Ramoji

Indian Institute of Science, July 2023.

PDF

Neural Representation Learning for Speech and Audio Signals

P. Agrawal

Indian Institute of Science, Jan. 2021.

PDF

Signal Analysis using Autoregressive Models of Amplitude Modulation

S. Ganapathy

Johns Hopkins University, Jan. 2012.

PDF

Investigating Neural Mechanisms of Word Learning and Speech Perception

A.Soman

Thesis Defense Talk, February 2024.

Slides

Graph Clustering Approaches for Speaker Diarization of Conversational Speech

P.Singh

Thesis Defense Talk, February 2024.

Slides

Dereverberation of Speech Using Autoregressive Models of Sub-band Envelopes

Anurenjan P.R.

Thesis Defense Talk, September 2023.

Slides

Graph Clustering approaches for Speaker Diarization of Conversational Speech

P. Singh

Thesis Colloquium Talk, July 2023.

Slides

Investigating Neural Mechanisms of Word Learning and Speech Perception

A. Soman

Thesis Colloquium Talk, July 2023.

Supervised Approaches for Language and Speaker Recognition

S. Ramoji

Thesis Defense Talk, July 2023.

Slides

Neural Representation Learning for Speech and Audio Signals

P. Agrawal

Thesis Defense Talk, January 2021.

Video

Neural Representation Learning of Speech and Audio Signals

P. Agrawal

Thesis Colloquium Talk, July 2020.

Video Slides

Speaker and Language Recognition - From Laboratory Technologies to the Wild

S. Ganapathy

Invited Perspective Keynote Talk, Interspeech 2018.

Slides

The Art and Science of Speech Feature Engineering

S. Ganapathy and S. Thomas

Interspeech, Singapore, Sept. 2014.

Slides

A Mixture-of-Experts model for multimodal emotion recognition in conversations

Soumya Dutta, Smruthi Balaji, Sriram Ganapathy

Computer Speech & Language, October 2026

ScienceDirect Code

Leveraging Content and Acoustic Representations for Speech Emotion Recognition

Soumya Dutta, Sriram Ganapathy

IEEE Transactions on Audio, Speech and Language Processing, 2025.

arXiv Code

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization

Prachi Singh, Sriram Ganapathy

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.

arXiv

Gradient-free Post-hoc Explainability Using Distillation Aided Learnable Approach

Debarpan Bhattacharya, Amir H. Poorjam, Deepak Mittal, Sriram Ganapathy

IEEE Journal of Selected Topics in Signal Processing (JSTSP)- Special Series on AI in Signal & Data Science, 2024.

arXiv

Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments

Baghel, Shikha, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, and Sriram Ganapathy

Elsevier Speech Communication 161 (2024): 103080.

arXiv

Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications

Varun Krishna, Tarun Sai, Sriram Ganapathy

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.

arXiv

Speech Dereverberation with Frequency Domain Autoregressive Modeling

Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.

arXiv

Multi-Modal Point-of-Care Diagnostics for COVID-19 Based on Acoustics and Symptoms

S. R. Chetupalli, P. Krishnan, N.K.Sharma, A.Muguli, R.Kumar, V.Nanda, L.M.Pinto, P.K.Ghosh, S. Ganapathy

IEEE Journal of Translational Engineering in Health and Medicine, 2023.

IEEE Xplore

Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection

D. Bhattacharya, N. K. Sharma, D. Dutta, S. R. Chetupalli, P. Mote, S. Ganapathy, S. Nori, S. Gonuguntla, M. Alagesan

Nature Scientific Data, 2023.

Nature

PLDA inspired Siamese networks for speaker verification

Ramoji, Shreyas, Prashant Krishnan, and Sriram Ganapathy

Computer Speech and Language, 2022.

ScienceDirect

ERP Evidences of Rapid Semantic Learning In Foreign Language Word Comprehension

Akshara Soman, Prathibha Ramachandran, and Sriram Ganapathy

Frontiers in Neuroscience, 2022.

frontiers Data

Towards sound based testing of COVID-19 -Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge

Neeraj Kumar Sharma, Ananya Muguli, Prashant Krishnan, Rohit Kumar, Srikanth Raj Chetupalli, Sriram Ganapathy

Elsevier Journal on Computer, Speech and Language, 2022.

ScienceDirect

Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

Purushothaman, A., Sreeram, A., Kumar, R., & Ganapathy, S

Elsevier Journal on Computer, Speech and Language, 2021.

arXiv

Deep Correlation Analysis for Audio-EEG Decoding

JR Katthi, S. Ganapathy

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2021.

PDF Code

Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization

P. Singh, S. Ganapathy

IEEE Transactions and Audio, Speech and Language Processing, 2021.

PDF Code

Acoustic and linguistic features influence talker change detection

N. Sharma, V. Krishnamohan, S. Ganapathy, A. Gangopadhyay, L. Fink

Journal of Acoustic Society of America (JASA) - Express Letters, EL414, Oct. 2020.

Paper

Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

P. Agrawal, S. Ganapathy

IEEE Transactions and Audio, Speech and Language Processing, 2020.

PDF Code

Automatic Speaker Profiling from Short Duration Speech Data

Shareef Babu Kalluri, Deepu Vijayasenan and S. Ganapathy

Elsevier Speech Communications, April 2020.

PDF

Towards Relevance and Sequence Modeling in Language Recognition

B. Padi, A. Mohan and S. Ganapathy

IEEE Transactions on Audio, Speech and Language Processing, March, 2020.

PDF Code

Supervised I-vector Modeling for Language and Accent Recognition

S. Ramoji and S. Ganapathy

Elsevier Journal on Computer, Speech and Language, Oct. 2019.

PDF

A Study on Pairwise LDA for X-vector based Speaker Recognition

A. Kanagasundaram, S. Sridharan, S. Ganapathy and C. Fookes

IET Electronic Letters, (2019).

PDF

Modulation Filter Learning Using Deep Variational Networks for Robust Speech Recognition

P. Agrawal and S. Ganapathy

IEEE Journal of Selected Topics in Signal Processing (J-STSP), Special Issue on Data Science: Machine Learning for Audio Signal Processing, April 2019.

PDF Code

An EEG Study On The Brain Representations in Language Learning

A. Soman, Madhavan C. R., K. Sarkar, and S. Ganapathy

IOP Journal on Biomedical Physics and Engineering Express, 5(2), 25041, (2019).

PDF

Talker change detection: A comparison of human and machine performance

N. Sharma, S. Ganesh, S. Ganapathy and L. Holt

Journal of Acoustical Society of America, December 2018.

PDF Code

Convolutional Neural Network based Robust Denoising of Low-Dose Computed Tomography Perfusion Maps

V. S. Kadimesetty, S. Gutta, S. Ganapathy, and P. K. Yalavarthy

IEEE Transactions on Radiation and Plasma Medical Sciences, August 2018.

PDF

Deep Neural Network Based Bandwidth Enhancement of Photoacoustic Data

S. Gutta, V.S. Kadimesetty, S. K. Kalva, M. Pramanik, S. Ganapathy and P. K. Yalavarthy

Journal of Biomedical Optics, October 2017.

PDF

Increasing the Robustness of CNN Acoustic Models using ARMA Spectrogram Features and Channel Dropout

G. Kocavs, L. Toth, D. V. Compernolle and S. Ganapathy

Elsevier Pattern Recognition Letters, September 2017.

PDF

Unsupervised Modulation Filter Learning for Noise-Robust Speech Recognition

P. Agrawal and S. Ganapathy

Journal of Acoustical Society of America, Sept. 2017.

PDF Code

Multi-variate Autoregressive Spectrogram Modeling for Noisy Speech Recognition

S. Ganapathy

IEEE Signal Processing Letters, July 2017.

PDF

Auditory Motivated Front-end for Noisy Speech Using Spectro-temporal Modulation Filtering

S. Ganapathy and M. Omar

Journal of Acoustical Society of America, EL343-349, Vol. 136(5), Nov. 2014.

PDF

Robust Feature Extraction Using Modulation Filtering of Autoregressive Models

S. Ganapathy, H. Mallidi and H. Hermansky

IEEE Transactions on Audio, Speech and Language Processing, Vol. 22(8), pp. 1285-1295, Aug. 2014.

PDF

Enhancing Frequency Shifted Speech Signals in Single Side Band Communication

S. Ganapathy and J. Pelecanos

IEEE Signal Processing Letters, Vol. 20(12), pp. 1231-1234, Oct. 2013.

PDF

Temporal Resolution Analysis in Frequency Domain Linear Prediction

S. Ganapathy and H. Hermansky

Journal of Acoustical Society of America, EL436-442, Vol. 132(5), Oct. 2012.

PDF

Temporal envelope compensation for robust phoneme recognition using modulation spectrum

S. Ganapathy, S. Thomas and H. Hermansky

Journal of Acoustical Society of America, Vol. 128(6), pp. 3769-3780, Dec. 2010.

PDF

Autoregressive Models Of Amplitude Modulations In Audio Compression

S. Ganapathy, P. Motlicek and H. Hermansky

IEEE Transactions on Audio, Speech and Language Processing, Vol. 18(6), pp.1624-1631, Aug. 2010.

PDF

Wide-Band Audio Coding based on Frequency Domain Linear Prediction

P. Motlicek, S. Ganapathy, H. Hermansky and H. Garudadri

EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2010 (3), pp. 1-14, Jan. 2010.

PDF

Modulation Frequency Features For Phoneme Recognition In Noisy Speech

S. Ganapathy, S. Thomas and H. Hermansky

Journal of Acoustical Society of America, EL8-12, Vol. 125(1), Jan. 2009.

PDF

Recognition Of Reverberant Speech Using Frequency Domain Linear Prediction

S. Thomas, S. Ganapathy and H. Hermansky

IEEE Signal Processing Letters, Vol. 15, pp. 681-684, Dec 2008.

Benchmarking Humans and Machines on Complex Multilingual Speech Understanding Tasks

Sai Samrat Kankanala, Ram Chandra, S. Ganapathy

ICASSP 2026, Barcelona, Spain

arXiv

ULTRAS - Unified Learning of Transformer Representations for Audio and Speech Signals

Ameenudeen P E, Charumathi Narayanan, S. Ganapathy

ASRU 2025, Honolulu, HI, USA.

arXiv

FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs

D. Bhattacharya, Apoorva Kulkarni, S. Ganapathy

EMNLP 2025, Suzhou, China. [A* Conference]

arXiv Code

Towards Unbiased Evaluation of Time-series Anomaly Detector

D. Bhattacharya, Sumanta Mukherjee, Chandramouli Kamanchi, Vijay Ekambaram, Arindam Jati, Pankaj Dayama

ICASSP 2025, Hyderabd, India.

IEEE Xplore

ABHINAYA - A System for Speech Emotion Recognition In Naturalistic Conditions Challenge

S. Dutta, S. Balaji, R. Varada, V. Salinamakki, S. Ganapathy

Interspeech 2025, Netherlands.

arXiv

Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning

D. Bhattacharya, A. Kulkarni, S. Ganapathy

Interspeech 2025, Netherlands.

arXiv

Spoken Language Understanding on Unseen Tasks With In-Context Learning

N. Agarwal, S. Ganapathy

Interspeech 2025, Netherlands.

arXiv

LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations

Soumya Dutta, Sriram Ganapathy

ICASSP 2025, Hyderabad, India.

arXiv

Identifying and Mitigating Mismatched Language Code in Multilingual ASR

J. Kim, S. Mavandadi, K. Audhkhasi, S. Bharadwaj, B. Farris, T. Chen, B. Ramabhadran, S. Ganapathy

ICASSP 2025, Hyderabad, India.

IEEE Xplore

Enhancing Customer Service Chatbots with Context-Aware NLU through Selective Attention and Multi-task Learning

Subhadip Nandi, Neeraj Agrawal, Anshika Singh, Priyanka Bhatt

CODS-COMAD 2024, Jodhpur, India.

arXiv

Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model

Subhadip Nandi, Neeraj Agrawal

EMNLP 2024, Florida.

ACL Anthology

Improving Self-supervised Pre-training using Accent-Specific Codebooks

D. Prabhu, A. Gupta, O. Nitsure, P. Jyothi, S. Ganapathy

Interspeech 2024, Kos Island, Greece.

arXiv

The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environment

S. B. Kalluri, P. Singh, P.R. Chowdhuri, A. Kulkarni, S. Baghel, P. Hegde, S. Sontakke, Deepak K T, S.R.M. Prasanna, D. Vijayasenan, S. Ganapathy

Interspeech 2024, Kos Island, Greece.

arXiv

LLM Augmented LLMs: Expanding Capabilities through Composition

Bansal, R., B. Samanta, S. Dalmia, N. Gupta, S. Ganapathy, A. Bapna, P. Jain, and P. Talukdar

In The Twelfth International Conference on Learning Representations (ICLR) 2024, Vienna Austria. [A* conference]

arXiv

Zero Shot Audio to Audio to Audio Emotion Transfer with Speaker Disentanglement

Soumya Dutta and Sriram Ganapathy

ICASSP 2024, Seoul, South Korea

arXiv Code

Multimodal modeling for spoken language identification

S.Bharadwaj, M.Ma, S.Vashishth, A.Bapna, S.Ganapathy, V.Axelrod, S.Dalmia, W.Han, Y.Zhang, D.V.Esch, S.Ritchie, P.Talukder, J.Riesa

ICASSP 2024, Seoul, South Korea.

IEEE Xplore

Self-Influence Guided Data Reweighting for Language Model Pre-training

M.Thakkar, T.Bolukbasi, S.Ganapathy, S.Vashishth, S.Chandar, P.Talukdar

EMNLP 2023, Singapore. [A* Conference]

ACL Anthology

Accented Speech Recognition With Accent-specific Codebooks

D.Prabhu, P.Jyothi, S.Ganapathy, V.Unni

EMNLP 2023, Singapore. [A* Conference]

ACL Anthology

MASR:Multi-Label Aware Speech Representation

A.Raj, S. Bharadwaj, S. Ganapathy, M. Ma, S.Vashishth

IEEE ASRU 2023, Taiwan.

arXiv

Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations

Varun Krishna and Sriram Ganapathy

IEEE ASRU 2023, Taiwan.

IEEE Xplore

Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy

Neeraj Agrawal, Saurabh Kumar, Priyanka Bhatt, Tanishka Agarwal

ECAI 2023, Poland.

IOS Press

Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Saurabh Kumar, Sourav Bansal, Neeraj Agrawal, Priyanka Bhatt

ECAI 2023, Poland.

IOS Press

Label Aware Speech Representation Learning For Language Identification

S. Vashishth, S. Bharadwaj, S. Ganapathy, A. Bapna, M. Ma, W. Han, V. Axelrod, P. Talukdar

Interspeech 2023, Dublin, Ireland.

arXiv

Enhancing the EEG Speech Match-Mismatch Tasks With Word Boundaries

Akshara Soman, Vidhi Sinha, and Sriram Ganapathy

Interspeech 2023, Dublin, Ireland.

arXiv Code

DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

S. Baghel, S. Ramoji, Sidharth, Ranjana H, P. Singh, S. Jain, P. R. Chowdhuri, K. Kulkarni, S. Padhi, D. Vijayasenan and S. Ganapathy

Interspeech 2023, Dublin, Ireland.

arXiv

Supervised Hierarchical Clustering Using Graph Neural Networks for Speaker Diarization

Prachi Singh ,Amrit Kaul and Sriram Ganapathy

ICASSP 2023, Rhodes Island, Greece.

IEEE Xplore

Interpretable Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection

Debottam Dutta ,Debarpan Bhattacharya ,Sriram Ganapathy, Amir H. Poorjam, Deepak Mittal, and Maneesh Singh

Interspeech 2022, Incheon, South Korea.

arXiv

Analyzing the impact of SARS-CoV-2 variants on respiratory sound signals

Debarpan Bhattacharya, Debottam Dutta ,Neeraj Kumar Sharma ,Srikanth Raj Chetupalli ,Pravin Mote, Sriram Ganapathy ,Chandrakiran C ,Sahiti Nori ,Suhail K K ,Sadhana Gonuguntla ,and Murali Alagesan

Interspeech 2022, Incheon, South Korea.

arXiv

Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms

Debarpan Bhattacharya ,Debottam Dutta ,Neeraj Kumar Sharma ,Srikanth Raj Chetupalli ,Pravin Mote ,Sriram Ganapathy ,Chandrakiran C ,Sahiti Nori ,Suhail K K ,Sadhana Gonuguntla ,and Murali Alagesan

Interspeech 2022, Incheon, South Korea.

arXiv

Multimodal Transformer with Learnable Frontend and Self Attention for Emotion Recognition

Soumya Dutta and Sriram Ganapathy

ICASSP 2022, Singapore.

PDF Code

The Second DiCOVA Challenge: Dataset and performance analysis for Diagnosis of COVID-19 using acoustics

Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, and Sriram Ganapathy

ICASSP 2022, Singapore.

arXiv

End-to-end speech recognition with joint dereverberation of sub-band autoregressive envelopes

Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, and Sriram Ganapathy

ICASSP 2022, Singapore.

arXiv Code

Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech

Varun Krishna PS and Sriram Ganapathy

ICASSP 2022, Singapore.

PDF Code

Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization

P. Singh and S. Ganapathy

ASRU 2021, Cartagena.

arXiv Code

Investigating the Feature Selection and Explainability of COVID-19 Diagnostics from Cough Sounds

A. Flavio, A. Poorjam, D. Mittal, C. Dognin, A. Muguli, R. Kumar, S. R. Chetupalli, S. Ganapathy and M. Singh

Interspeech 2021, Brno, Czech Republic.

LEAP Submission for the Third DIHARD Diarization Challenge

P. Singh, R. Varma, V. Krishnamohan, S. R. Chetupalli and S. Ganapathy

Interspeech 2021, Brno, Czech Republic.

arXiv Video PPT

SRIB-LEAP lab submission to Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing

P. R. Gudepu, R. Kumar, M. K. Jayesh, A. Purushothaman, S. Ganapathy and M. A. Basha

Interspeech 2021, Brno, Czech Republic.

arXiv

The Third DIHARD Diarization Challenge

N. Ryant, P. Singh, V. Krishnamohan, R. Varma, K. Church, C. Cieri, J. Du, S. Ganapathy and M. Liberman

Interspeech 2021, Brno, Czech Republic.

arXiv

DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics

A. Muguli, L. Pinto, R. Nirmala, N. Sharma, P. Krishnan, P. Ghosh, R. Kumar, S. Bhat, S. R. Chetupalli, S. Ganapathy, S. Ramoji and V. Nanda

Interspeech 2021, Brno, Czech Republic.

arXiv

A Multi-Head Relevance Weighting Framework for Learning Raw Waveform Audio Representations

D. Dutta, P. Agrawal, and S. Ganapathy

WASPAA 2021, New York, USA.

arXiv

NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling

Shareef Babu Kalluri, Deepu Vijayasenan, Ganapathy, S., & Krishnan, P.

ICASSP 2021, Toronto.

IEEE Xplore

Deep Multiway Canonical Correlation Analysis For Multi-Subject EEG Normalization

Katthi, J. R., & Ganapathy, S.

ICASSP 2021, Toronto.

arXiv

End-to-End Lyrics Recognition with Voice to Singing Style Transfer

Basak, S., Agarwal, S., Ganapathy, S., & Takahashi, N.

ICASSP 2021, Toronto.

arXiv

Representation Learning For Speech Recognition Using Feedback Based Relevance Weighting

P. Agrawal and S. Ganapathy

ICASSP 2021, Toronto, 2020.

arXiv

Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis

N. Sharma, P. Krishnan, R. Kumar, S. Ramoji, S. R. Chetupalli, R. Nirmala, P. K. Ghosh and S. Ganapathy

Interspeech 2020, Beijing, October 2020

arXiv Dataset

Neural PLDA Modeling for End-to-End Speaker Verification

S. Ramoji, P. Krishnan and S. Ganapathy

Interspeech 2020, Beijing, October 2020

PDF Code

Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

P. Agrawal and S. Ganapathy

Interspeech 2020, Beijing, October 2020.

PDF

Audiovisual Correspondence Learning in Humans And Machines

V. Krishnamohan, A. Soman, A. Gupta and S. Ganapathy

Interspeech 2020, Beijing, October 2020.

PDF

Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition

A. Purushothaman, A. Sreeram, R. Kumar and S. Ganapathy

Interspeech 2020, Beijing, October 2020.

PDF

Deep Self-Supervised Hierarchical Clustering for Speaker Diarization

P. Singh and S. Ganapathy

Interspeech 2020, Beijing, October 2020.

PDF

Context Dependent RNNLM for Automatic Transcription of Conversations

S. R. Chetupalli and S. Ganapathy

Interspeech 2020, Beijing, October 2020.

PDF

Deep Canonical Correlation Analysis For Decoding The Auditory Brain

J. Reddy and S. Ganapathy

IEEE EMBC, Toronto, Canada, July 2020.

PDF

NPLDA: A Deep Neural PLDA Model for Speaker Verification

S. Ramoji, P. Krishnan, and S. Ganapathy

Speaker Odyssey Workshop, November, 2020.

arXiv

LEAP System for SRE19 Challenge - Improvements and Error Analysis

S. Ramoji, P. Krishnan, B. Mysore, P. Singh and S. Ganapathy

Speaker Odyssey Workshop, November, 2020.

arXiv

On The Impact of Language Familiarity In Talker Change Detection

N. Sharma, V. Krishnamohan, S. Ganapathy, A. Gangopadhayay and L. Fink

ICASSP 2020.

PDF

3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

A. Purushothaman, A. Sreeram and S. Ganapathy

ICASSP 2020.

arXiv

Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR

R. Kumar, A. Sreeram, A. Purushothaman and S. Ganapathy

ICASSP 2020.

arXiv

Improving Voice Separation by Incorporating End-to-End Speech Recognition

N. Takahashi, M. Singh, S. Basak, P. Sudarsanam, S. Ganapathy, Y. Mitsufuji

ICASSP 2020.

arXiv

Second Language Transfer Learning in Humans and Machines Using Image Supervision

K. Praveen, A. Gupta, A. Soman and S. Ganapathy

IEEE ASRU, Dec. 2019.

PDF

Speaker and Language Aware Training for End-to-End ASR

S. Bansal, K. Malhotra, S. Ganapathy

IEEE ASRU, Dec. 2019.

PDF

The Second DIHARD Diarization Challenge: Dataset - task - and baselines

N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy and M. Liberman

INTERSPEECH, Sept. 2019, Austria.

PDF

LEAP Diarization System for the Second DIHARD Challenge

P. Singh, Harsha Vardhan M A, S. Ganapathy and A. Kanagasundaram

INTERSPEECH, Sept. 2019, Austria.

PDF

Attention based Hybrid I-vector BLSTM Model for Language Recognition

B. Padi, A. Mohan and S. Ganapathy

INTERSPEECH, Sept. 2019, Austria.

PDF

Active Learning Methods for Low Resource End-To-End Automatic Speech Recognition

K. Malhotra, S. Bansal and S. Ganapathy

INTERSPEECH, Sept. 2019, Austria.

PDF

Unsupervised Raw Waveform Representation Learning for ASR

P. Agrawal and S. Ganapathy

INTERSPEECH, Sept. 2019, Austria.

PDF

A Study of X-vector Based Speaker Recognition on Short Utterances

A. Kanagasundaram, S. Sridharan, S. Ganapathy and P. Singh

INTERSPEECH, Sept. 2019, Austria.

PDF

The LEAP Speaker Recognition System for NIST SRE 2018 Challenge

S. Ramoji, A. Mohan, B. Mysore, A. Bhatia, P. Singh, Harsha Vardhan M A and S. Ganapathy

ICASSP, 2019.

PDF

Analyzing human reaction time for talker change detection

N. Sharma, S. Ganesh, S. Ganapathy and L. Holt

ICASSP, 2019.

PDF

Analyzing human reaction time for talker change detection

N. Sharma, S. Ganesh, S. Ganapathy and L. Holt

ICASSP, 2019.

PDF Code

Deep variational filter learning models for speech recognition

P. Agrawal and S. Ganapathy

ICASSP, 2019.

PDF

A Deep Neural Network Based End-to-End Model for Joint Height And Age Estimation From Short Duration Speech

Shareef Babu Kalluri, Deepu Vijayasenan and S. Ganapathy

ICASSP, 2019.

PDF

End-to-end language recognition using attention based hierarchical gated recurrent unit models

B. Padi, A. Mohan and S. Ganapathy

ICASSP, 2019.

PDF Code

Supervised i-vector modeling - Theory and Applications

S. Ramoji and S. Ganapathy

INTERSPEECH, 2018.

PDF

Comparison of unsupervised modulation filter learning methods for ASR

P. Agrawal and S. Ganapathy

INTERSPEECH, 2018.

PDF

PhaseNet: Discretized phase modeling with deep neural networks for audio source separation

N. Takahashi, P. Agrawal, N. Goswami and Y. Mitsufuji

INTERSPEECH, 2018.

PDF

Talker diarization in the wild: The case of child-centered daylong audio-recordings

A. Cristia, S. Ganesh, M. Casillas and S. Ganapathy

INTERSPEECH, 2018.

PDF

On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification

R. Kumar, V. Yeruva and S. Ganapathy

INTERSPEECH, 2018.

PDF

Far-Field Speech Recognition Using Multivariate Autoregressive Models

S. Ganapathy and M. Harish

INTERSPEECH, 2018.

PDF

The LEAP Language Recognition System for LRE 2017 Challenge - Improvements and Error Analysis

B. Padi, S. Ramoji, V. Yeruva, S. Kumar and S. Ganapathy

Odyssey: The speaker and language recognition workshop, 2018.

Paper

Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings

N. Sajjan, S. Ganesh, N. Sharma, S. Ganapathy and N. Ryant

ICASSP, Calgary Canada, April 2018.

PDF

3-D CNN Models for Far-Field Multi-Channel Speech Recognition

S. Ganapathy and V. Peddinti

ICASSP, Calgary Canada, April 2018.

PDF

Enhancement and Analysis of Conversational Speech: JSALT 2017

N. Ryant et al.

ICASSP, Calgary Canada, April 2018.

PDF

Unsupervised HMM Posteriograms for Language Independent Acoustic Modeling in Zero Resource Conditions

Ansari T, R. Kumar, S. Singh and S. Ganapathy

IEEE ASRU, Dec. 2017.

PDF

Deep Learning Methods For Unsupervised Acoustic Modeling - LEAP Submission to ZeroSpeech Challenge 2017

Ansari T, R. Kumar, S. Singh, S. Ganapathy

IEEE ASRU, Dec. 2017.

PDF

Leveraging Native Language Speech For Accent Identfication Using Deep Siamese Networks

A. Siddhant, P. Jyothi and S. Ganapathy

IEEE ASRU, Dec. 2017.

PDF

Speech representation learning using unsupervised data-driven modulation filtering for robust ASR

P. Agrawal and S. Ganapathy

Interspeech, Stockholm, Sweden, Aug. 2017.

PDF

IITG-Indigo system for NIST 2016 SRE challenge

N. Kumar, R. K. Das, S. Jelil, Dhanush B K, H. Kashyap, K. S. R. Murthy, S. Ganapathy, R. Sinha and S. R. M. Prasanna

Interspeech, Stockholm, Sweden, Aug. 2017.

PDF

Factor Analysis Methods for Joint Speaker Verification and Spoof Detection

Dhanush B, Suparna S., Aarthy R., Likhita C., Shashank D., Harish H. and S. Ganapathy

ICASSP, New Orleans, USA, 2017.

PDF

The IBM Speaker Recognition System: Recent Advances and Error Analysis

S. Sadjadi, J. Pelecanos and S. Ganapathy

Interspeech, San Francisco, September, 2016.

PDF

An investigation on the use of ivectors for improved ASR robustness

D. Dimitriadis, S. Thomas and S. Ganapathy

Interspeech, San Francisco, Sept. 2016.

PDF

The IBM 2016 Speaker Recognition System

S. Sadjadi, S. Ganapathy and J. Pelecanos

Odyssey, Spain, June, 2016.

PDF

Speaker Age Estimation On Conversational Telephone Speech Using Senone Posterior Based I-vectors

S. Sadjadi, S. Ganapathy and J. Pelecanos

ICASSP, Shanghai, March, 2016.

PDF

Investigating Factor Analysis Features for Deep Neural Networks In Noisy Speech Recognition

S. Ganapathy, S. Thomas, D. Dimitriadis, S. Rennie

Interspeech, Dresden, Germany, Sept. 2015.

PDF

Robust Speech Processing Using ARMA Spectrograms

S. Ganapathy

ICASSP, Brisbane, April, 2015.

PDF

Nearest Neighbor Discriminant Analysis for Language Recognition

S. Sadjadi, J. Pelecanos and S. Ganapathy

ICASSP, Brisbane, April, 2015.

PDF

Robust Language Identification Using Convolutional Neural Networks

S. Ganapathy, K. J. Han, S. Thomas, M. Omar, M. V. Segbroeck and S. Narayanan

Interspeech, Singapore, Sept. 2014.

PDF

Shift-Invariant Features for Speech Activity Detection in Adverse Radio-Frequency Channel Conditions

M. Omar and S. Ganapathy

ICASSP, Florence, Italy, May, 2014.

PDF

Analyzing Convolutional Neural Networks for Speech Activity Detection in Mismatched Acoustic Conditions

K. J. Han, S. Ganapathy, M Li, M. Omar and S. Narayanan

ICASSP, Florence, Italy, May, 2014.

PDF

The IBM Speech Activity Detection System for the DARPA RATS Program

G. Saon, S. Thomas, H. Soltau, S. Ganapathy and B. Kingsbury

Interspeech, Lyon, Aug. 2013.

PDF

TRAP Language Identification System for RATS Phase II Evaluation

K. J. Han, S. Ganapathy, M Li, M. Omar and S. Narayan

Interspeech, Lyon, Aug. 2013.

PDF

Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models

H. Mallidi, S. Ganapathy and H. Hermansky

Interspeech, Lyon, Aug. 2013.

PDF

Unsupervised Channel Adaptation For Language Identification Using Co-training

S. Ganapathy, M. Omar and J. Pelecanos

ICASSP, Vancouver, May, 2013.

PDF

Noisy Channel Adaptation in Language Identification

S. Ganapathy, M. Omar and J. Pelecanos

IEEE SLT, Miami, Dec, 2012.

PDF

Robust Phoneme Recognition Using High Resolution Temporal Envelopes

S. Ganapathy and H. Hermansky

Interspeech, Portland, Sept. 2012.

PDF

Data-driven Posterior Features for Low Resource Speech Recognition Applications

S. Thomas, S. Ganapathy, A. Jansen and H. Hermansky

Interspeech, Portland, Sept. 2012.

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

S. Ganapathy, S. Thomas and H. Hermansky

ISCA Speaker Odyssey, June 2012.

PDF

Adaptation Transforms of Auto-Associative Neural Networks as Features for Speaker Verification

S. Thomas, H. Mallidi, S. Ganapathy and H. Hermansky

ISCA Speaker Odyssey, June 2012.

The UMD-JHU 2011 Speaker Recognition System

D. Gomero et al.

ICASSP, Japan, Mar. 2012.

Multilingual MLP Features For Low-resource LVCSR Systems

S. Thomas, S. Ganapathy and H. Hermansky

ICASSP, Japan, Mar. 2012.

Multi-layer Perceptron Based Speech Activity Detection for Speaker Verification

S. Ganapathy, P. Rajan and H. Hermansky

IEEE WASPAA, Oct. 2011.

PDF

Modulation spectrum analysis for recognition of reverberant speech

H. Mallidi, S. Ganapathy and H. Hermansky

Interspeech, Italy, Aug. 2011.

PDF

Feature Normalization for Speaker Verification in Room Reverberation

S. Ganapathy, J. Pelecanos and M. Omar

ICASSP, Prague, May 2011.

PDF

Sparse Auto-associative Neural Networks: Theory and Application to Speech Recognition

S. Garimella, S. Ganapathy and H. Hermansky

Interspeech, Japan, Sept. 2010.

Cross-lingual and Multi-stream Posterior Features for Low-resource LVCSR Systems

S. Thomas, S. Ganapathy and H. Hermansky

Proc. of Interspeech, Japan, Sept. 2010.

A Phoneme Recognition Framework based on Auditory Spectro-Temporal Receptive Fields

S. Thomas, K. Patil, S. Ganapathy, N. Mesgarani, H. Hermansky

Proc. of Interspeech, Japan, Sept. 2010.

Robust Spectro-Temporal Features Based on Autoregressive Models of Hilbert Envelopes

S. Ganapathy, S. Thomas and H. Hermansky

ICASSP, Dallas, USA, March 2010.

PDF

Comparison of Modulation Features For Phoneme Recognition

S. Ganapathy, S. Thomas and H. Hermansky

ICASSP, Dallas, USA, March 2010.

PDF

Temporal Envelope Subtraction for Robust Speech Recognition Using Modulation Spectrum

S. Ganapathy, S. Thomas, and H. Hermansky

IEEE ASRU, 2009.

PDF

Applications of Signal Analysis Using Autoregressive Models for Amplitude Modulation

S. Ganapathy, S. Thomas, P. Motlicek and H. Hermansky

IEEE WASPAA 2009.

PDF

Static and Dynamic Modulation Spectrum for Speech Recognition

S. Ganapathy, S. Thomas and H. Hermansky

Proc. of Interspeech, Brighton, UK, Sept. 2009.

PDF

Tandem Representations of Spectral Envelope and Modulation Frequency Features for ASR

S. Thomas, S. Ganapathy and H. Hermansky

Proc. of Interspeech, Brighton, UK, Sept. 2009.

PDF

Phoneme Recognition Using Spectral Envelope and Modulation Frequency Features

S. Thomas, S. Ganapathy and H. Hermansky

ICASSP, Taiwan, April 2009.

PDF

Front-end for Far-field Speech Recognition based on Frequency Domain Linear Prediction

S. Ganapathy, S. Thomas and H. Hermansky

Proc. of INTERSPEECH, Brisbane, Australia, Sep 2008.

PDF

Hilbert Envelope Based Specto-Temporal Features for Phoneme Recognition in Telephone Speech

S. Thomas, S. Ganapathy and H. Hermansky

Proc. of INTERSPEECH, Brisbane, Australia, Sep 2008.

PDF

Spectral Noise Shaping: Improvements in Speech/Audio Codec Based on Linear Prediction in Spectral Domain

S. Ganapathy, P. Motlicek, H. Hermansky and H. Garudadri

Proc. of INTERSPEECH, Brisbane, Australia, Sep 2008.

PDF

Perceptually motivated Sub-band Decomposition for FDLP Audio Coding

P. Motlicek, S. Ganapathy, H. Hermansky, H. Garudadri and Marios Athineos

Lecture Notes In Artificial Intelligence, Springer-Verlag Berlin, Heidelberg, 2008.

PDF

Spectro-Temporal Features for Automatic Speech Recognition using Linear Prediction in Spectral Domain

S. Thomas, S. Ganapathy and H. Hermansky

Proc. of EUSIPCO, Lausanne, Switzerland, Aug 2008.

PDF

Autoregressive Modelling of Hilbert Envelopes for Wide-band Audio Coding

S. Ganapathy, P. Motlicek, H. Hermansky and H. Garudadri

AES 124th Convention, Audio Engineering Society, May 2008.

PDF

Temporal Masking for Bit-rate Reduction in Audio Codec Based on Frequency Domain Linear Prediction

S. Ganapathy, P. Motlicek, H. Hermansky and H. Garudadri

Proc. of ICASSP, April 2008.

PDF

Hilbert Envelope Based Features for Far-Field Speech Recognition

S. Thomas, S. Ganapathy and H. Hermansky

Lecture Notes in Computer Science, Springer Berlin, Heidelberg 2008.

PDF

Frequency Domain Linear Prediction for QMF Sub-bands and Applications to Audio Coding

P. Motlicek, H. Hermansky, S. Ganapathy and H. Garudadri

Lecture Notes in Computer Science, Springer Berlin, Heidelberg 2007.

PDF

Non - Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

P. Motlicek, H. Hermansky, S. Ganapathy and H. Garudadri

Lecture Notes in Computer Science, Springer Berlin, Heidelberg 2007.

PDF

Adaptive System Combination in Language Recognition

Oct. 2018.

Patent

Spectral Noise Shaping in Audio Coding Based on Spectral Dynamics in Frequency Sub-bands

Nov. 2011.

Patent

Temporal Masking in Audio Coding Based on Spectral Dynamics in Frequency Sub-bands

Aug. 2009.

Patent

Thesis

Acoustic and Semantic Modeling of Emotion in Spoken Language

Investigating Neural Mechanisms of Word Learning and Speech Perception

Graph Clustering Approaches for Speaker Diarization of Conversational Speech

Dereverberation of Speech Using Autoregressive Models of Sub-band Envelopes

Supervised Learning Approaches for Language and Speaker Recognition

Neural Representation Learning for Speech and Audio Signals

Signal Analysis using Autoregressive Models of Amplitude Modulation

Tutorials, Keynotes, Defense and Colloquia

Investigating Neural Mechanisms of Word Learning and Speech Perception

Graph Clustering Approaches for Speaker Diarization of Conversational Speech

Dereverberation of Speech Using Autoregressive Models of Sub-band Envelopes

Graph Clustering approaches for Speaker Diarization of Conversational Speech

Investigating Neural Mechanisms of Word Learning and Speech Perception

Supervised Approaches for Language and Speaker Recognition

Neural Representation Learning for Speech and Audio Signals

Neural Representation Learning of Speech and Audio Signals

Speaker and Language Recognition - From Laboratory Technologies to the Wild

The Art and Science of Speech Feature Engineering

Journals

A Mixture-of-Experts model for multimodal emotion recognition in conversations

Leveraging Content and Acoustic Representations for Speech Emotion Recognition

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization

Gradient-free Post-hoc Explainability Using Distillation Aided Learnable Approach

Summary of the DISPLACE challenge 2023-DIarization of SPeaker and LAnguage in Conversational Environments

Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications

Speech Dereverberation with Frequency Domain Autoregressive Modeling

Multi-Modal Point-of-Care Diagnostics for COVID-19 Based on Acoustics and Symptoms

Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection

PLDA inspired Siamese networks for speaker verification

ERP Evidences of Rapid Semantic Learning In Foreign Language Word Comprehension

Towards sound based testing of COVID-19 -Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge

Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

Deep Correlation Analysis for Audio-EEG Decoding

Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization

Acoustic and linguistic features influence talker change detection

Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

Automatic Speaker Profiling from Short Duration Speech Data

Towards Relevance and Sequence Modeling in Language Recognition

Supervised I-vector Modeling for Language and Accent Recognition

A Study on Pairwise LDA for X-vector based Speaker Recognition

Modulation Filter Learning Using Deep Variational Networks for Robust Speech Recognition

An EEG Study On The Brain Representations in Language Learning

Talker change detection: A comparison of human and machine performance

Convolutional Neural Network based Robust Denoising of Low-Dose Computed Tomography Perfusion Maps

Deep Neural Network Based Bandwidth Enhancement of Photoacoustic Data

Increasing the Robustness of CNN Acoustic Models using ARMA Spectrogram Features and Channel Dropout

Unsupervised Modulation Filter Learning for Noise-Robust Speech Recognition

Multi-variate Autoregressive Spectrogram Modeling for Noisy Speech Recognition

Auditory Motivated Front-end for Noisy Speech Using Spectro-temporal Modulation Filtering

Robust Feature Extraction Using Modulation Filtering of Autoregressive Models

Enhancing Frequency Shifted Speech Signals in Single Side Band Communication

Temporal Resolution Analysis in Frequency Domain Linear Prediction

Temporal envelope compensation for robust phoneme recognition using modulation spectrum

Autoregressive Models Of Amplitude Modulations In Audio Compression

Wide-Band Audio Coding based on Frequency Domain Linear Prediction

Modulation Frequency Features For Phoneme Recognition In Noisy Speech

Recognition Of Reverberant Speech Using Frequency Domain Linear Prediction

Conferences

Benchmarking Humans and Machines on Complex Multilingual Speech Understanding Tasks

ULTRAS - Unified Learning of Transformer Representations for Audio and Speech Signals

FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs

Towards Unbiased Evaluation of Time-series Anomaly Detector

ABHINAYA - A System for Speech Emotion Recognition In Naturalistic Conditions Challenge

Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning

Spoken Language Understanding on Unseen Tasks With In-Context Learning

LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations

Identifying and Mitigating Mismatched Language Code in Multilingual ASR

Enhancing Customer Service Chatbots with Context-Aware NLU through Selective Attention and Multi-task Learning

Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model

Improving Self-supervised Pre-training using Accent-Specific Codebooks

The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environment

LLM Augmented LLMs: Expanding Capabilities through Composition

Zero Shot Audio to Audio to Audio Emotion Transfer with Speaker Disentanglement

Multimodal modeling for spoken language identification

Self-Influence Guided Data Reweighting for Language Model Pre-training

Accented Speech Recognition With Accent-specific Codebooks

MASR:Multi-Label Aware Speech Representation

Pseudo-Label Based Supervised Contrastive Loss for Robust Speech Representations

Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy