Detai Xin

Orcid: 0009-0007-1908-1137

According to our database¹, Detai Xin authored at least 20 papers between 2020 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions.

[BibT_eX]

[DOI]

Detai Xin

Shinnosuke Takamichi

Hiroshi Saruwatari

Speech Commun., January, 2024

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec.

[BibT_eX]

[DOI]

CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

Building speech corpus with diverse voice characteristics for its prompt-based representation.

[BibT_eX]

[DOI]

CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions.

[BibT_eX]

[DOI]

IEEE Access, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions.

[BibT_eX]

[DOI]

Dataset, October, 2023

Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations.

[BibT_eX]

[DOI]

Detai Xin

Shinnosuke Takamichi

Hiroshi Saruwatari

CoRR, 2022

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation.

[BibT_eX]

[DOI]

CoRR, 2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Detai Xin

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...