Haohan Guo

Orcid: 0000-0002-3393-9984

According to our database1, Haohan Guo authored at least 23 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.
CoRR, 2024

FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications.
CoRR, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis.
CoRR, 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.
CoRR, 2024

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner.
CoRR, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder.
CoRR, 2024

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.
CoRR, 2024

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data.
CoRR, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2024

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.
CoRR, 2023

2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.
CoRR, 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Adversarial Waveform Generation Based Singing Voice Conversion with Harmonic Signals.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Conversational End-to-End TTS for Voice Agents.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

2020
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.
CoRR, 2020

Conversational End-to-End TTS for Voice Agent.
CoRR, 2020

2019
Feature reinforcement with word embedding and parsing information in neural TTS.
CoRR, 2019

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A New GAN-Based End-to-End TTS Training Algorithm.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019


  Loading...