Haohan Guo

Orcid: 0000-0002-3393-9984

According to our database¹, Haohan Guo authored at least 23 papers between 2019 and 2024.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.

[BibT_eX]

[DOI]

CoRR, 2024

FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications.

[BibT_eX]

[DOI]

CoRR, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner.

[BibT_eX]

[DOI]

CoRR, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder.

[BibT_eX]

[DOI]

CoRR, 2024

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data.

[BibT_eX]

[DOI]

Álvaro Martín-Cortinas

Soledad López Gambino

Kayeon Yoo

Elena Sokolova

Thomas Drugman

CoRR, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.

[BibT_eX]

[DOI]

CoRR, 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Adversarial Waveform Generation Based Singing Voice Conversion with Harmonic Signals.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Conversational End-to-End TTS for Voice Agents.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

2020

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training.

[BibT_eX]

[DOI]

CoRR, 2020

Conversational End-to-End TTS for Voice Agent.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Feature reinforcement with word embedding and parsing information in neural TTS.

[BibT_eX]

[DOI]

CoRR, 2019

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A New GAN-Based End-to-End TTS Training Algorithm.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Haohan Guo

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...