ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization.
CoRR, June, 2025
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Physics Informed Distillation for Diffusion Models.
Trans. Mach. Learn. Res., 2024
LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
One-Shot Exemplification Modeling via Latent Sense Representations.
Proceedings of the 8th Workshop on Representation Learning for NLP, 2023
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Counterfactual Two-Stage Debiasing For Video Corpus Moment Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2023
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Selective Query-guided Debiasing Network for Video Corpus Moment Retrieval.
CoRR, 2022
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Selective Query-Guided Debiasing for Video Corpus Moment Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022