Hao Huang

Orcid: 0000-0001-6604-0951

Affiliations:
  • Xinjiang Univerity, School of Information Science and Engineering, Xinjiang Provincial Key Laboratory of Multilingual Information Technology, Urumqi, China
  • Shanghai Jiao Tong University, Department of Electronic Engineering, Shanghai, China (PhD 2008)


According to our database1, Hao Huang authored at least 51 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
GLFER-Net: a polyphonic sound source localization and detection network based on global-local feature extraction and recalibration.
EURASIP J. Audio Speech Music. Process., December, 2024

IIFC-Net: A Monaural Speech Enhancement Network With High-Order Information Interaction and Feature Calibration.
IEEE Signal Process. Lett., 2024

Scene text recognition with context-aware autonomous bidirectional iterative models.
J. Intell. Fuzzy Syst., 2024

Introducing Multilingual Phonetic Information to Speaker Embedding for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2024

Fact-Aware Summarization with Contrastive Learning for Few-Shot Dialogue State Tracking.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision.
EURASIP J. Audio Speech Music. Process., December, 2023

GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System.
Proceedings of the ACM Multimedia Asia 2023, 2023

Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization.
Proceedings of the ACM Multimedia Asia 2023, 2023

Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MTANet: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improved Keyword Recognition Based on Aho-Corasick Automaton.
Proceedings of the International Joint Conference on Neural Networks, 2023

CRA-DIFFUSE: Improved Cross-Domain Speech Enhancement Based on Diffusion Model with T-F Domain Pre-Denoising.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Speech Topic Classification Based on Pre-trained and Graph Networks.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

A Joint Network Based on Interactive Attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speakeraugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation.
Proceedings of the IEEE International Conference on Acoustics, 2023

SRTNET: Time Domain Speech Enhancement via Stochastic Refinement.
Proceedings of the IEEE International Conference on Acoustics, 2023

Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Hierarchic Temporal Convolutional Network With Cross-Domain Encoder for Music Source Separation.
IEEE Signal Process. Lett., 2022

A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition.
Speech Commun., 2022

Multi-stage music separation network with dual-branch attention and hybrid convolution.
J. Intell. Inf. Syst., 2022

Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder.
CoRR, 2022

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition.
CoRR, 2022

Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition.
CoRR, 2022

A Multi-grained based Attention Network for Semi-supervised Sound Event Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigating Effective Domain Adaptation Method for Speaker Verification Task.
Proceedings of the Neural Information Processing - 29th International Conference, 2022

GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples.
Proceedings of the Neural Information Processing - 29th International Conference, 2022

Mining Hard Samples Locally And Globally For Improved Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Minimum Word Error Training For Non-Autoregressive Transformer-Based Code-Switching ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

Virtual Fully-Connected Layer for a Large-Scale Speaker Verification Dataset.
Proceedings of the Biometric Recognition - 16th Chinese Conference, 2022

2021
Dual Attention Network for Pitch Estimation of Monophonic Music.
Symmetry, 2021

Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion.
Digit. Signal Process., 2021

Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

E2E-Based Multi-Task Learning Approach to Joint Speech and Accent Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Encoder-Decoder Based Pitch Tracking and Joint Model Training for Mandarin Tone Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Enriching Under-Represented Named Entities for Improved Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Monaural Singing Voice and Accompaniment Separation Based on Gated Nested U-Net Architecture.
Symmetry, 2020

Enriching Under-Represented Named-Entities To Improve Speech Recognition Performance.
CoRR, 2020

The NTU-AISG Text-to-speech System for Blizzard Challenge 2020.
CoRR, 2020

A multilingual approach to joint Speech and Accent Recognition with DNN-HMM framework.
CoRR, 2020

A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-Switching Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2017
Mandarin tone modeling using recurrent neural networks.
CoRR, 2017

2016
Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria.
Proceedings of the Intelligent Computing Methodologies - 12th International Conference, 2016

I-vector based deep neural network acoustic model adaptation using multilingual language resource.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

2009
Minimum tag error for discriminative training of conditional random fields.
Inf. Sci., 2009


  Loading...