Takaaki Saeki

Orcid: 0000-0001-6003-768X

According to our database1, Takaaki Saeki authored at least 29 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
CoRR, 2024

Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features.
Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources.
IEEE Access, 2023

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speechlmscore: Evaluating Speech Generation Using Speech Language Model.
Proceedings of the IEEE International Conference on Acoustics, 2023

Yodas: Youtube-Oriented Dataset for Audio and Speech.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection.
CoRR, 2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses.
CoRR, 2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis.
CoRR, 2022

VTTS: Visual-Text To Speech.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Personalized Filled-pause Generation with Group-wise Prediction Models.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model.
IEEE Signal Process. Lett., 2021

Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials.
IEICE Trans. Inf. Syst., 2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification.
CoRR, 2021

ESPnet2-TTS: Extending the Edge of TTS Research.
CoRR, 2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2010
Impact and Use of the Asymmetric Property in Bi-directional Cooperative Relaying under Asymmetric Traffic Conditions.
IEICE Trans. Commun., 2010


  Loading...