Shinnosuke Takamichi

Speech Commun., January, 2024

Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization.

[BibT_eX]

[DOI]

CoRR, 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

DNN-based ensemble singing voice synthesis with interactions between singers.

[BibT_eX]

[DOI]

CoRR, 2024

Text-To-Speech Synthesis In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec.

[BibT_eX]

[DOI]

CoRR, 2024

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

Textless Dependency Parsing by Labeled Sequence Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data.

[BibT_eX]

[DOI]

Hitoshi Suda

Aya Watanabe

CoRR, 2024

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals.

[BibT_eX]

[DOI]

CoRR, 2024

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment.

[BibT_eX]

[DOI]

CoRR, 2024

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

Building speech corpus with diverse voice characteristics for its prompt-based representation.

[BibT_eX]

[DOI]

CoRR, 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.

[BibT_eX]

[DOI]

CoRR, 2024

JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions.

[BibT_eX]

[DOI]

IEEE Access, 2024

Do Learned Speech Symbols Follow Zipf's Law?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions.

[BibT_eX]

[DOI]

Dataset, October, 2023

Environmental sound conversion from vocal imitations and sound event labels.

[BibT_eX]

[DOI]

CoRR, 2023

Foley Sound Synthesis at the DCASE 2023 Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources.

[BibT_eX]

[DOI]

IEEE Access, 2023

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

TimToShape: Supporting Practice of Musical Instruments by Visualizing Timbre with 2D Shapes based on Crossmodal Correspondences.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023

Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

HumanDiffusion: diffusion model using perceptual gradients.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Yodas: Youtube-Oriented Dataset for Audio and Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection.

[BibT_eX]

[DOI]

CoRR, 2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses.

[BibT_eX]

[DOI]

CoRR, 2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

How Should We Evaluate Synthesized Environmental Sounds.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations.

[BibT_eX]

[DOI]

Detai Xin

CoRR, 2022

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation.

[BibT_eX]

[DOI]

CoRR, 2022

VTTS: Visual-Text To Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Robustness of Signal Processing-Based Pseudonymization Method Against Decryption Attack.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Personalized Filled-pause Generation with Group-wise Prediction Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model.

[BibT_eX]

[DOI]

Takaaki Saeki

IEEE Signal Process. Lett., 2021

Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

Noise Robust Acoustic Anomaly Detection System with Nonnegative Matrix Factorization Based on Generalized Gaussian Distribution.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification.

[BibT_eX]

[DOI]

CoRR, 2021

ESPnet2-TTS: Extending the Edge of TTS Research.

[BibT_eX]

[DOI]

CoRR, 2021

Onoma-to-wave: Environmental sound synthesis from onomatopoeic words.

[BibT_eX]

[DOI]

CoRR, 2021

Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation.

[BibT_eX]

[DOI]

Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Humanacgan: Conditional Generative Adversarial Network with Human-Based Auxiliary Classifier and its Evaluation in Phoneme Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Japanese EFL Learners' Speaking Practice Utilizing Text-to-Speech Technology Within a Team-Based Flipped Learning Framework.

[BibT_eX]

[DOI]

Proceedings of the Learning and Collaboration Technologies: New Challenges and Learning Experiences, 2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network.

[BibT_eX]

[DOI]

Takaaki Saeki

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis.

[BibT_eX]

[DOI]

Speech Commun., 2020

Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks.

[BibT_eX]

[DOI]

Signal Process., 2020

DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2020

Generative Moment Matching Network-Based Neural Double-Tracking for Synthesized and Natural Singing Voices.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2020

JSSS: free Japanese speech corpus for summarization and simplification.

[BibT_eX]

[DOI]

CoRR, 2020

PJS: phoneme-balanced Japanese singing voice corpus.

[BibT_eX]

[DOI]

Junya Koguchi

CoRR, 2020

JVS-MuSiC: Japanese multispeaker singing-voice corpus.

[BibT_eX]

[DOI]

CoRR, 2020

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention.

[BibT_eX]

[DOI]

Masashi Aso

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Humangan: Generative Adversarial Network With Human-Based Discriminator And Its Evaluation In Speech Perception Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis.

[BibT_eX]

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

PJS: phoneme-balanced Japanese singing-voice corpus.

[BibT_eX]

[DOI]

Junya Koguchi

Masanori Morise

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Daiki Sekizawa

IEICE Trans. Inf. Syst., 2019

Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2019

Overview of Tasks and Investigation of Subjective Evaluation Methods in Environmental Sound Synthesis and Conversion.

[BibT_eX]

[DOI]

CoRR, 2019

JVS corpus: free Japanese multi-speaker voice corpus.

[BibT_eX]

[DOI]

CoRR, 2019

TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication.

[BibT_eX]

[DOI]

Riku Arakawa

Proceedings of the Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, 2019

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

V2S attack: building DNN-based voice conversion from automatic speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis.

[BibT_eX]

[DOI]

Tomoki Koriyama

Takao Kobayashi

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device.

[BibT_eX]

[DOI]

Riku Arakawa

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Spoken Dialogue Robot for Watching Daily Life of Elderly People.

[BibT_eX]

[DOI]

Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

Speech Quality Evaluation of Synthesized Japanese Speech Using EEG.

[BibT_eX]

[DOI]

Ivan Halim Parmonangan

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Estimating confidence in voices using crowdsourcing for alleviating tension with altered auditory feedback.

[BibT_eX]

[DOI]

Proceedings of the AsianCHI@CHI 2019: Asian CHI Symposium, 2019

2018

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

An end-to-end model for cross-lingual transformation of paralinguistic information.

[BibT_eX]

[DOI]

Mach. Transl., 2018

CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Phase Reconstruction from Amplitude Spectrograms Based on Von-Mises-Distribution Deep Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Text-to-Speech Synthesis Using STFT Spectra Based on Low-/Multi-Resolution Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Data augmentation with moment-matching networks for i-vector based speaker verification.

[BibT_eX]

[DOI]

Sayaka Shiota

Tomoko Matsui

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis.

[BibT_eX]

[DOI]

Takanori Akiyama

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

Voice Conversion Using Input-to-Output Highway Networks.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2017

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis.

[BibT_eX]

[DOI]

Ryosuke Sonobe

CoRR, 2017

Sampling-Based Speech Parameter Generation Using Moment-Matching Networks.

[BibT_eX]

[DOI]

Tomoki Koriyama

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

The UTokyo speech synthesis system for Blizzard Challenge 2017.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017

Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech synthesis using FFT spectra.

[BibT_eX]

[DOI]