2025

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio.

[DOI]

Xuenan Xu

Jiahao Mei

CoRR, March, 2025

2024

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound.

[DOI]

IEEE J. Sel. Top. Signal Process., December, 2024

Towards Weakly Supervised Text-to-Audio Grounding.

[DOI]

IEEE Trans. Multim., 2024

Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance.

[DOI]

Yaoyun Zhang

Xuenan Xu

Mengyue Wu

CoRR, 2024

Unified Pathological Speech Analysis with Prompt Tuning.

[DOI]

CoRR, 2024

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs.

[DOI]

CoRR, 2024

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning.

[DOI]

CoRR, 2024

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation.

[DOI]

CoRR, 2024

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset.

[DOI]

CoRR, 2024

FakeSound: Deepfake General Audio Detection.

[DOI]

CoRR, 2024

Zero-Shot Audio Captioning Using Soft and Hard Prompts.

[DOI]

CoRR, 2024

Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning.

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining.

[DOI]

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning.

[DOI]

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Efficient Audio Captioning with Encoder-Level Knowledge Distillation.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FakeSound: Deepfake General Audio Detection.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Audio Generation Diversity with Visual Information.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

A Large-scale Dataset for Audio-Language Representation Learning.

[DOI]

CoRR, 2023

Improving Audio Caption Fluency with Automatic Error Correction.

[DOI]

CoRR, 2023

BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data.

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Enhance Temporal Relations in Audio Captioning with Sound Event Detection.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning.

[DOI]

Xuenan Xu

Mengyue Wu

Kai Yu

Proceedings of the IEEE International Conference on Acoustics, 2023

Diverse and Vivid Sound Generation from Text Descriptions.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

A Comprehensive Survey of Automated Audio Captioning.

[DOI]

Xuenan Xu

Mengyue Wu

Kai Yu

CoRR, 2022

Automatic Detection Pipeline for Accessing the Motor Severity of Parkinson's Disease in Finger Tapping and Postural Stability.

[DOI]

IEEE Access, 2022

Can Audio Captions Be Evaluated With Image Caption Metrics?

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.

[DOI]

Xuenan Xu

Mengyue Wu

Kai Yu

Proceedings of the IEEE International Conference on Acoustics, 2022

Audio-Text Retrieval in Context.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Navigating Audio-Visual Event Detection Across Mismatched Modalities.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Category-Adapted Sound Event Enhancement with Weakly Labeled Data.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Audio Caption in a Car Setting with a Sentence-Level Loss.

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

A Lightweight Framework for Online Voice Activity Detection in the Wild.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning.

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

Creating and Evaluating a Goal Setting Prototype for MOOCs.

[DOI]

Nathan Magyar

Xuenan Xu

Molly Maher

Proceedings of the Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 2020

2019

What does a Car-ssette tape tell?

[DOI]

CoRR, 2019