Audio-Visual Representation Learning For Lip-Sync Estimation Through Ranking Augmented Contrastive Training.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024
SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
RefTextLAS: Reference Text Biased Listen, Attend, and Spell Model For Accurate Reading Evaluation.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Domain Prompts: Towards memory and compute efficient domain adaptation of ASR systems.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Efficient domain adaptation of language models in ASR systems using Prompt-tuning.
CoRR, 2021
Towards Continual Entity Learning in Language Models for Conversational Agents.
CoRR, 2021
Neural Composition: Learning to Generate from Multiple Models.
CoRR, 2020
Jasper: An End-to-End Convolutional Neural Acoustic Model.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation.
CoRR, 2018