2025
MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio.
CoRR, March, 2025
2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound.
IEEE J. Sel. Top. Signal Process., December, 2024
Towards Weakly Supervised Text-to-Audio Grounding.
IEEE Trans. Multim., 2024
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance.
CoRR, 2024
Unified Pathological Speech Analysis with Prompt Tuning.
CoRR, 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs.
CoRR, 2024
DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning.
CoRR, 2024
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation.
CoRR, 2024
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset.
CoRR, 2024
FakeSound: Deepfake General Audio Detection.
CoRR, 2024
Zero-Shot Audio Captioning Using Soft and Hard Prompts.
CoRR, 2024
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining.
Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024
Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning.
Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024
Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Efficient Audio Captioning with Encoder-Level Knowledge Distillation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
FakeSound: Deepfake General Audio Detection.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds.
Proceedings of the IEEE International Conference on Acoustics, 2024
Enhancing Audio Generation Diversity with Visual Information.
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
A Large-scale Dataset for Audio-Language Representation Learning.
CoRR, 2023
Improving Audio Caption Fluency with Automatic Error Correction.
CoRR, 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Enhance Temporal Relations in Audio Captioning with Sound Event Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023
Diverse and Vivid Sound Generation from Text Descriptions.
Proceedings of the IEEE International Conference on Acoustics, 2023
2022
A Comprehensive Survey of Automated Audio Captioning.
CoRR, 2022
Automatic Detection Pipeline for Accessing the Motor Severity of Parkinson's Disease in Finger Tapping and Postural Stability.
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Access, 2022
Can Audio Captions Be Evaluated With Image Caption Metrics?
Proceedings of the IEEE International Conference on Acoustics, 2022
Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Audio-Text Retrieval in Context.
Proceedings of the IEEE International Conference on Acoustics, 2022
Navigating Audio-Visual Event Detection Across Mismatched Modalities.
Proceedings of the IEEE International Conference on Acoustics, 2022
Category-Adapted Sound Event Enhancement with Weakly Labeled Data.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Audio Caption in a Car Setting with a Sentence-Level Loss.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
A Lightweight Framework for Online Voice Activity Detection in the Wild.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020
Creating and Evaluating a Goal Setting Prototype for MOOCs.
Proceedings of the Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 2020
2019
What does a Car-ssette tape tell?
CoRR, 2019