2025
ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization.
CoRR, June, 2025

Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Physics Informed Distillation for Diffusion Models.
Trans. Mach. Learn. Res., 2024

LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
One-Shot Exemplification Modeling via Latent Sense Representations.
Proceedings of the 8th Workshop on Representation Learning for NLP, 2023

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Counterfactual Two-Stage Debiasing For Video Corpus Moment Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2023

HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Selective Query-guided Debiasing Network for Video Corpus Moment Retrieval.
CoRR, 2022

Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Selective Query-Guided Debiasing for Video Corpus Moment Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022