Zhaoheng Ni

According to our database¹, Zhaoheng Ni authored at least 37 papers between 2017 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Scaling Speech Technology to 1, 000+ Languages.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding.

[BibT_eX]

[DOI]

CoRR, 2024

SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text.

[BibT_eX]

[DOI]

CoRR, 2024

Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching.

[BibT_eX]

[DOI]

CoRR, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

Foleygen: Visually-Guided Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

An Empirical Study on the Impact of Positional Encoding in Transformer-Based Monaural Speech Enhancement.

[BibT_eX]

[DOI]

Qiquan Zhang

Meng Ge

Hongxu Zhu

Eliathamby Ambikairajah

Qi Song

Zhaoheng Ni

Haizhou Li

Proceedings of the IEEE International Conference on Acoustics, 2024

Folding Attention: Memory and Power Optimization for On-Device Transformer-Based Streaming Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Stack-and-Delay: A New Codebook Pattern for Music Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Less Peaky and More Accurate CTC Forced Alignment by Label Priors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

On the Open Prompt Challenge in Conditional Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.

[BibT_eX]

[DOI]

J. Open Source Softw., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).

[BibT_eX]

[DOI]

Dataset, October, 2023

A Time-Frequency Attention Module for Neural Speech Enhancement.

[BibT_eX]

[DOI]

Eliathamby Ambikairajah

Haizhou Li

IEEE ACM Trans. Audio Speech Lang. Process., 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.

[BibT_eX]

[DOI]

CoRR, 2023

Enhance audio generation controllability through representation similarity regularization.

[BibT_eX]

[DOI]

CoRR, 2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Ripple Sparse Self-Attention for Monaural Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Torchaudio-Squim: Reference-Less Speech Quality and Intelligibility Measures in Torchaudio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Time-Frequency Attention for Monaural Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Torchaudio: Building Blocks for Audio and Speech Processing.

[BibT_eX]

[DOI]

Vincent Quenneville-Bélair

Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

TorchAudio: Building Blocks for Audio and Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2021

WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

2020

Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2020

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks.

[BibT_eX]

[DOI]

CoRR, 2020

Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2020

Mask-Dependent Phase Estimation for Monaural Speaker Separation.

[BibT_eX]

[DOI]

Zhaoheng Ni

Michael I. Mandel

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Onssen: an open-source speech separation and enhancement library.

[BibT_eX]

[DOI]

Zhaoheng Ni

Michael I. Mandel

CoRR, 2019

2018

Sound Signal Processing with Seq2Tree Network.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Unusable Spoken Response Detection with BLSTM Neural Networks.

[BibT_eX]

[DOI]

David Suendermann-Oeft

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

2017

A Sep2Tree Model for Recognizing Synthetic Bach Chorales.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Computer Music Conference, 2017

Confused or not Confused?: Disentangling Brain Activity from EEG Data Using Bidirectional LSTM Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 8th ACM International Conference on Bioinformatics, 2017

Zhaoheng Ni

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...