Yu Wang

IEEE ACM Trans. Audio Speech Lang. Process., 2024

DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog.

[BibT_eX]

[DOI]

Zhe Chen

Hongcheng Liu

IEEE ACM Trans. Audio Speech Lang. Process., 2024

ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents.

[BibT_eX]

[DOI]

CoRR, 2024

HSDreport: Heart Sound Diagnosis with Echocardiography Reports.

[BibT_eX]

[DOI]

CoRR, 2024

MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation.

[BibT_eX]

[DOI]

CoRR, 2024

TAIA: Large Language Models are Out-of-Distribution Data Learners.

[BibT_eX]

[DOI]

CoRR, 2024

MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts.

[BibT_eX]

[DOI]

CoRR, 2024

M<sup>3</sup>AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator.

[BibT_eX]

[DOI]

CoRR, 2024

Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview.

[BibT_eX]

[DOI]

Heyang Liu

CoRR, 2024

M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation.

[BibT_eX]

[DOI]

CoRR, 2024

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception.

[BibT_eX]

[DOI]

CoRR, 2024

Annotation-free Audio-Visual Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

MSG-BART: Multi-Granularity Scene Graph-Enhanced Encoder-Decoder Language Model for Video-Grounded Dialogue Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

RA2FD: Distilling Faithfulness into Efficient Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

CE-VDG: Counterfactual Entropy-based Bias Reduction for Video-grounded Dialogue Generation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

SDA: Semantic Discrepancy Alignment for Text-conditioned Image Retrieval.

[BibT_eX]

[DOI]

Yuchen Yang

Proceedings of the Findings of the Association for Computational Linguistics, 2024

DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Self-Supervised Masking for Unsupervised Anomaly Detection and Localization.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Redundancy-Adaptive Multimodal Learning for Imperfect Data.

[BibT_eX]

[DOI]

CoRR, 2023

Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning.

[BibT_eX]

[DOI]

CoRR, 2023

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework.

[BibT_eX]

[DOI]

CoRR, 2023

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

SelfEvolve: A Code Evolution Framework via Large Language Models.

[BibT_eX]

[DOI]

Shuyang Jiang

Yuhao Wang

CoRR, 2023

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery.

[BibT_eX]

[DOI]

CoRR, 2023

Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition.

[BibT_eX]

[DOI]

Zihan Zhao

CoRR, 2023

Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Contrastive Learning Based ASR Robust Knowledge Selection For Spoken Dialogue System.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Self-Improvement of Non-autoregressive Model via Sequence-Level Distillation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Enhanced Multimodal Representation Learning with Cross-modal KD.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Unsupervised Ensemble Distillation for Multi-Organ Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on Biomedical Imaging, 2022

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition.

[BibT_eX]

[DOI]

Zihan Zhao

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

LAR-SR: A Local Autoregressive Model for Image Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Efficient Use of End-to-End Data in Spoken Language Processing.

[BibT_eX]

[DOI]

Yiting Lu

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Spoken Language 'Grammatical Error Correction'.

[BibT_eX]

[DOI]

Yiting Lu

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

General Sequence Teacher-Student Learning.

[BibT_eX]

[DOI]

Mark John Francis Gales

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Non-native Speaker Verification for Spoken Language Assessment.

[BibT_eX]

[DOI]

Linlin Wang

CoRR, 2019

Disfluency Detection for Spoken Learner English.

[BibT_eX]

[DOI]

Proceedings of the 8th ISCA International Workshop on Speech and Language Technology in Education, 2019

Impact of ASR Performance on Spoken Grammatical Error Detection.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 27th European Signal Processing Conference, 2019

Learning Between Different Teacher and Student Models in ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Model-Based Speech Enhancement in the Modulation Domain.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Towards automatic assessment of spontaneous spoken English.

[BibT_eX]

[DOI]

Konstantinos Kyriakopoulos

Andrey Malinin

Rogier C. van Dalen

M. Rashid

Speech Commun., 2018

Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2018

Sequence Teacher-Student Training of Acoustic Models for Automatic Free Speaking Language Assessment.

[BibT_eX]

[DOI]

Anton Ragni

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Impact of ASR Performance on Free Speaking Language Assessment.

[BibT_eX]

[DOI]

Konstantinos Kyriakopoulos

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Future Word Contexts in Neural Network Language Models.

[BibT_eX]

[DOI]

CoRR, 2017

An attention based model for off-topic spontaneous spoken response detection: An Initial Study.

[BibT_eX]

[DOI]

Proceedings of the 7th ISCA International Workshop on Speech and Language Technology in Education, 2017

Use of Graphemic Lexicons for Spoken Language Assessment.

[BibT_eX]

[DOI]

Konstantinos Kyriakopoulos

Anton Ragni

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

A data-driven non-intrusive measure of speech quality and intelligibility.

[BibT_eX]

[DOI]

Speech Commun., 2016

Speech enhancement using an MMSE spectral amplitude estimator based on a modulation domain Kalman filter with a Gamma prior.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Off-topic Response Detection for Spontaneous Spoken English Assessment.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2014

Speech enhancement usinga modulation domain Kalman filter post-processor with a Gaussian Mixture noise model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Speech enhancement using a robust Kalman filter post-processor in the modulation domain.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

A subspace method for speech enhancement in the modulation domain.

[BibT_eX]

[DOI]