Zhaoheng Ni

According to our database1, Zhaoheng Ni authored at least 35 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Scaling Speech Technology to 1, 000+ Languages.
J. Mach. Learn. Res., 2024

Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition.
CoRR, 2024

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching.
CoRR, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.
CoRR, 2024

Foleygen: Visually-Guided Audio Generation.
Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

An Empirical Study on the Impact of Positional Encoding in Transformer-Based Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2024

Folding Attention: Memory and Power Optimization for On-Device Transformer-Based Streaming Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Stack-and-Delay: A New Codebook Pattern for Music Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Less Peaky and More Accurate CTC Forced Alignment by Label Priors.
Proceedings of the IEEE International Conference on Acoustics, 2024

On the Open Prompt Challenge in Conditional Audio Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.
J. Open Source Softw., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).
Dataset, October, 2023

A Time-Frequency Attention Module for Neural Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.
CoRR, 2023

Enhance audio generation controllability through representation similarity regularization.
CoRR, 2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Ripple Sparse Self-Attention for Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

Torchaudio-Squim: Reference-Less Speech Quality and Intelligibility Measures in Torchaudio.
Proceedings of the IEEE International Conference on Acoustics, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

2022
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Time-Frequency Attention for Monaural Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022


Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
TorchAudio: Building Blocks for Audio and Speech Processing.
CoRR, 2021

WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

2020
Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement.
CoRR, 2020

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks.
CoRR, 2020

Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks.
CoRR, 2020

Mask-Dependent Phase Estimation for Monaural Speaker Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Onssen: an open-source speech separation and enhancement library.
CoRR, 2019

2018
Sound Signal Processing with Seq2Tree Network.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Unusable Spoken Response Detection with BLSTM Neural Networks.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

2017
A Sep2Tree Model for Recognizing Synthetic Bach Chorales.
Proceedings of the 2017 International Computer Music Conference, 2017

Confused or not Confused?: Disentangling Brain Activity from EEG Data Using Bidirectional LSTM Recurrent Neural Networks.
Proceedings of the 8th ACM International Conference on Bioinformatics, 2017


  Loading...