Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition.
CoRR, 2024
Converging Vulnerability Insights: Unifying Vulnerability Intelligence For Enhanced Application Security With Collaboration.
Proceedings of the ITU Kaleidoscope 2024: Innovation and Digital Transformation for a Sustainable World, 2024
Turn-Taking and Backchannel Prediction with Acoustic and Large Language Model Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Cross-Utterance ASR Rescoring with Graph-Based Label Propagation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Two-Pass Endpoint Detection for Speech Recognition.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Guided Contrastive Self-Supervised Pre-Training for Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
ASR-Aware End-to-End Neural Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2022
Self-Supervised Learning with Cross-Modal Transformers for Emotion Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Audiovisual Highlight Detection in Videos.
Proceedings of the IEEE International Conference on Acoustics, 2021
Multiresolution and Multimodal Speech Recognition with Transformers.
CoRR, 2020
Multi-channel Acoustic Modeling using Mixed Bitrate OPUS Compression.
CoRR, 2020
Multi-Modal Embeddings Using Multi-Task Learning for Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Fully Learnable Front-End for Multi-Channel Acoustic Modeling Using Semi-Supervised Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Multimodal and Multiresolution Speech Recognition with Transformers.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Cisco's speaker segmentation and recognition system.
Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012