Haohe Liu

Orcid: 0000-0003-1036-7888

According to our database1, Haohe Liu authored at least 54 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift Evaluation Dataset.
Dataset, March, 2024

IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift Development Dataset.
Dataset, February, 2024

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

FlowSep: Language-Queried Sound Separation with Rectified Flow Matching.
CoRR, 2024

Efficient Audio Captioning with Encoder-Level Knowledge Distillation.
CoRR, 2024

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review.
CoRR, 2024

Zero-Shot Audio Captioning Using Soft and Hard Prompts.
CoRR, 2024

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound.
CoRR, 2024

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift.
CoRR, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining.
Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

First-Shot Unsupervised Anomalous Sound Detection with Unknown Anomalies Estimated by Metadata-Assisted Audio Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Retrieval-Augmented Text-to-Audio Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Audiosr: Versatile Audio Super-Resolution at Scale.
Proceedings of the IEEE International Conference on Acoustics, 2024

MusicLDM: Enhancing Novelty in text-to-music Generation Using Beat-Synchronous mixup Strategies.
Proceedings of the IEEE International Conference on Acoustics, 2024

Text-Queried Target Sound Event Localization.
Proceedings of the 32nd European Signal Processing Conference, 2024

Learning Temporal Resolution in Spectrogram for Audio Classification.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Learning to detect an animal sound from five examples.
Ecol. Informatics, November, 2023

Balanced SNR-Aware Distillation for Guided Text-to-Audio Generation.
CoRR, 2023

Synth-AC: Enhancing Audio Captioning with Synthetic Supervision.
CoRR, 2023

Multimodal Fish Feeding Intensity Assessment in Aquaculture.
CoRR, 2023

Separate Anything You Describe.
CoRR, 2023

WavJourney: Compositional Audio Creation with Large Language Models.
CoRR, 2023

Text-Driven Foley Sound Generation With Latent Diffusion Model.
CoRR, 2023

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks.
CoRR, 2023

Latent Diffusion Model Based Foley Sound Generation System For DCASE Challenge 2023 Task 7.
CoRR, 2023

Learning to detect an animal sound from five examples.
CoRR, 2023

Universal Source Separation with Weakly Labelled Data.
CoRR, 2023

Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study.
CoRR, 2023

Ontology-aware Learning and Evaluation for Audio Tagging.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapting Language-Audio Models as Few-Shot Audio Learners.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models.
Proceedings of the International Conference on Machine Learning, 2023

Simple Pooling Front-Ends for Efficient Audio Classification.
Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Pre-Trained AudioLDM for Sound Generation: A Benchmark Study.
Proceedings of the 31st European Signal Processing Conference, 2023

2022
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech.
CoRR, 2022

Learning the Spectrogram Temporal Resolution for Audio Classification.
CoRR, 2022

Surrey System for DCASE 2022 Task 5: Few-shot Bioacoustic Event Detection with Segment-level Metric Learning.
CoRR, 2022

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Separate What You Describe: Language-Queried Audio Source Separation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Neural Vocoder is All You Need for Speech Super-resolution.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Leveraging Pre-trained BERT for Audio Captioning.
Proceedings of the 30th European Signal Processing Conference, 2022

Segment-Level Metric Learning for Few-Shot Bioacoustic Event Detection.
Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021
CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet.
CoRR, 2021

VoiceFixer: Toward General Speech Restoration With Neural Vocoder.
CoRR, 2021

Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation.
CoRR, 2021

Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation.
Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

Speech Enhancement with Weakly Labelled Data from AudioSet.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020
Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
Design and Visualization of Guided GAN on MNIST dataset.
Proceedings of the 3rd International Conference on Graphics and Signal Processing, 2019


  Loading...