2025
A Diverse and Effective Retrieval-Based Debt Collection System with Expert Knowledge.
CoRR, April, 2025
Delusions of Large Language Models.
CoRR, March, 2025
WritingBench: A Comprehensive Benchmark for Generative Writing.
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio.
CoRR, March, 2025
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
A Diverse and Effective Retrieval-Based Debt Collection System with Expert Knowledge.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
PicoAudio: Enabling Precise Temporal Controllability in Text-to-Audio Generation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Is Your Image a Good Storyteller?
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound.
IEEE J. Sel. Top. Signal Process., December, 2024
Towards Weakly Supervised Text-to-Audio Grounding.
IEEE Trans. Multim., 2024
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Unified Pathological Speech Analysis with Prompt Tuning.
CoRR, 2024
Long Term Memory: The Foundation of AI Self-Evolution.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Mixed Chain-of-Psychotherapies for Emotional Support Chatbot.
CoRR, 2024
Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory.
CoRR, 2024
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation.
CoRR, 2024
FakeSound: Deepfake General Audio Detection.
CoRR, 2024
Evaluation of data inconsistency for multi-modal sentiment analysis.
CoRR, 2024
Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats.
CoRR, 2024
Phonetic and Lexical Discovery of a Canine Language using HuBERT.
CoRR, 2024
Mapping Long-term Causalities in Psychiatric Symptomatology and Life Events from Social Media.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning.
Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024
Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Efficient Audio Captioning with Encoder-Level Knowledge Distillation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
FakeSound: Deepfake General Audio Detection.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Semantic-Enhanced Supervised Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2024
A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds.
Proceedings of the IEEE International Conference on Acoustics, 2024
Enhancing Audio Generation Diversity with Visual Information.
Proceedings of the IEEE International Conference on Acoustics, 2024
Phonetic and Lexical Discovery of Canine Vocalization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Automatic Reconstruction of Ancient Chinese Pronunciations.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Multi-Label Supervised Contrastive Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue.
Trans. Assoc. Comput. Linguistics, 2023
PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health.
CoRR, 2023
Towards Lexical Analysis of Dog Vocalizations via Online Videos.
CoRR, 2023
Does My Dog "Speak" Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners.
CoRR, 2023
A Large-scale Dataset for Audio-Language Representation Learning.
CoRR, 2023
Improving Audio Caption Fluency with Automatic Error Correction.
CoRR, 2023
LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation.
CoRR, 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Enhance Temporal Relations in Audio Captioning with Sound Event Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023
Diverse and Vivid Sound Generation from Text Descriptions.
Proceedings of the IEEE International Conference on Acoustics, 2023
Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Detection of Multiple Mental Disorders from Social Media with Two-Stream Psychiatric Experts.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Transcribing Vocal Communications of Domestic Shiba lnu Dogs.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
DialogZoo: Large-Scale Dialog-Oriented Task Learning.
CoRR, 2022
Symptom Identification for Interpretable Detection of Multiple Mental Disorders.
CoRR, 2022
A Comprehensive Survey of Automated Audio Captioning.
CoRR, 2022
Psychiatric Scale Guided Risky Post Screening for Early Detection of Depression.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022
Can Audio Captions Be Evaluated With Image Caption Metrics?
Proceedings of the IEEE International Conference on Acoustics, 2022
Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Climate and Weather: Inspecting Depression Detection via Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Audio-Text Retrieval in Context.
Proceedings of the IEEE International Conference on Acoustics, 2022
Navigating Audio-Visual Event Detection Across Mismatched Modalities.
Proceedings of the IEEE International Conference on Acoustics, 2022
Category-Adapted Sound Event Enhancement with Weakly Labeled Data.
Proceedings of the IEEE International Conference on Acoustics, 2022
Symptom Identification for Interpretable Detection of Multiple Mental Disorders on Social Media.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
2021
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Towards Duration Robust Weakly Supervised Sound Event Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
DEPA: Self-Supervised Audio Embedding for Depression Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Audio Caption in a Car Setting with a Sentence-Level Loss.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
A Lightweight Framework for Online Voice Activity Detection in the Wild.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.
Proceedings of the IEEE International Conference on Acoustics, 2021
Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
Building Interpretable Interaction Trees for Deep NLP Models.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Interpreting Hierarchical Linguistic Interactions in DNNs.
CoRR, 2020
GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection.
CoRR, 2020
Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Multiple Sound Sources Localization from Coarse to Fine.
Proceedings of the Computer Vision - ECCV 2020, 2020
A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020
2019
What does a Car-ssette tape tell?
CoRR, 2019
Text-based Depression Detection: What Triggers An Alert.
CoRR, 2019
Audio Caption: Listen and Tell.
Proceedings of the IEEE International Conference on Acoustics, 2019
2018
Detecting and Analysing Spatial-Temporal Aggregation of Flight Turbulence with the QAR Big Data.
Proceedings of the 26th International Conference on Geoinformatics, 2018
2015
Perception of Cantonese tones by Mandarin speakers.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015
1999
Massively Parallel Simulated Annealing Embedded with Downhill - A SPMD Algorithm for Cluster Computing.
Proceedings of the International Workshop on Cluster Computing (IWCC '99), 1999