Takaaki Saeki

Orcid: 0000-0001-6003-768X

According to our database1, Takaaki Saeki authored at least 30 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
CoRR, 2024

Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features.
Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data.
Proceedings of the IEEE International Conference on Acoustics, 2024

NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023
SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources.
IEEE Access, 2023

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speechlmscore: Evaluating Speech Generation Using Speech Language Model.
Proceedings of the IEEE International Conference on Acoustics, 2023

Yodas: Youtube-Oriented Dataset for Audio and Speech.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection.
CoRR, 2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses.
CoRR, 2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis.
CoRR, 2022

VTTS: Visual-Text To Speech.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Personalized Filled-pause Generation with Group-wise Prediction Models.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model.
IEEE Signal Process. Lett., 2021

Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials.
IEICE Trans. Inf. Syst., 2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification.
CoRR, 2021

ESPnet2-TTS: Extending the Edge of TTS Research.
CoRR, 2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2010
Impact and Use of the Asymmetric Property in Bi-directional Cooperative Relaying under Asymmetric Traffic Conditions.
IEICE Trans. Commun., 2010


  Loading...