Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset.

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

PicoAudio: Enabling Precise Temporal Controllability in Text-to-Audio Generation.

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Toward Automatic Discovery of a Canine Phonetic Alphabet.

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Tracking Life's Ups and Downs: Mining Life Events from Social Media Posts for Mental Health Analysis.

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Is Your Image a Good Storyteller?

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound.

[DOI]

IEEE J. Sel. Top. Signal Process., December, 2024

Towards Weakly Supervised Text-to-Audio Grounding.

[DOI]

IEEE Trans. Multim., 2024

Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Unified Pathological Speech Analysis with Prompt Tuning.

[DOI]

CoRR, 2024

Long Term Memory: The Foundation of AI Self-Evolution.

[DOI]

CoRR, 2024

Mixed Chain-of-Psychotherapies for Emotional Support Chatbot.

[DOI]

CoRR, 2024

Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory.

[DOI]

CoRR, 2024

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation.

[DOI]

CoRR, 2024

FakeSound: Deepfake General Audio Detection.

[DOI]

CoRR, 2024

Evaluation of data inconsistency for multi-modal sentiment analysis.

[DOI]

Yufei Wang

CoRR, 2024

Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats.

[DOI]

CoRR, 2024

Phonetic and Lexical Discovery of a Canine Language using HuBERT.

[DOI]

CoRR, 2024

Mapping Long-term Causalities in Psychiatric Symptomatology and Life Events from Social Media.

[DOI]

Juqianqian Juqianqian

Dejiyangla Dejiyangla

Yujia Peng

Kenny Q. Zhu

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning.

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Investigating Passive Filter Pruning for Efficient CNN-Transformer Audio Captioning.

[DOI]

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Efficient Audio Captioning with Encoder-Level Knowledge Distillation.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FakeSound: Deepfake General Audio Detection.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Semantic-Enhanced Supervised Contrastive Learning.

[DOI]

Pingyue Zhang

Proceedings of the IEEE International Conference on Acoustics, 2024

A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Audio Generation Diversity with Visual Information.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Phonetic and Lexical Discovery of Canine Vocalization.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Automatic Reconstruction of Ancient Chinese Pronunciations.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Multi-Label Supervised Contrastive Learning.

[DOI]

Pingyue Zhang

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue.

[DOI]

Trans. Assoc. Comput. Linguistics, 2023

PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health.

[DOI]

CoRR, 2023

Towards Lexical Analysis of Dog Vocalizations via Online Videos.

[DOI]

CoRR, 2023

Does My Dog "Speak" Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners.

[DOI]

CoRR, 2023

A Large-scale Dataset for Audio-Language Representation Learning.

[DOI]

CoRR, 2023

Improving Audio Caption Fluency with Automatic Error Correction.

[DOI]

CoRR, 2023

LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation.

[DOI]

CoRR, 2023

BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data.

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection.

[DOI]

Pingyue Zhang

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Enhance Temporal Relations in Audio Captioning with Sound Event Detection.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Diverse and Vivid Sound Generation from Text Descriptions.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation.

[DOI]

Zhiling Zhang

Kenny Q. Zhu

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Detection of Multiple Mental Disorders from Social Media with Two-Stream Psychiatric Experts.

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Transcribing Vocal Communications of Domestic Shiba lnu Dogs.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

DialogZoo: Large-Scale Dialog-Oriented Task Learning.

[DOI]

CoRR, 2022

Symptom Identification for Interpretable Detection of Multiple Mental Disorders.

[DOI]

CoRR, 2022

A Comprehensive Survey of Automated Audio Captioning.

[DOI]

CoRR, 2022

Psychiatric Scale Guided Risky Post Screening for Early Detection of Depression.

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Can Audio Captions Be Evaluated With Image Caption Metrics?

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Climate and Weather: Inspecting Depression Detection via Emotion Recognition.

[DOI]

Wen Wu

Proceedings of the IEEE International Conference on Acoustics, 2022

Audio-Text Retrieval in Context.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Navigating Audio-Visual Event Detection Across Mismatched Modalities.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Category-Adapted Sound Event Enhancement with Weakly Labeled Data.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Symptom Identification for Interpretable Detection of Multiple Mental Disorders on Social Media.

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat.

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Towards Duration Robust Weakly Supervised Sound Event Detection.

[DOI]

Heinrich Dinkel