Shinnosuke Takamichi

Orcid: 0000-0003-0520-7847

According to our database1, Shinnosuke Takamichi authored at least 141 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions.
Speech Commun., January, 2024

Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.
CoRR, 2024

DNN-based ensemble singing voice synthesis with interactions between singers.
CoRR, 2024

Text-To-Speech Synthesis In The Wild.
CoRR, 2024

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec.
CoRR, 2024

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling.
CoRR, 2024

Textless Dependency Parsing by Labeled Sequence Prediction.
CoRR, 2024

Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data.
CoRR, 2024

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals.
CoRR, 2024

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment.
CoRR, 2024

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark.
CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024

Building speech corpus with diverse voice characteristics for its prompt-based representation.
CoRR, 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
CoRR, 2024

JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions.
IEEE Access, 2024

Do Learned Speech Symbols Follow Zipf's Law?
Proceedings of the IEEE International Conference on Acoustics, 2024

Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features.
Proceedings of the IEEE International Conference on Acoustics, 2024

Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions.
Dataset, October, 2023

Environmental sound conversion from vocal imitations and sound event labels.
CoRR, 2023

Foley Sound Synthesis at the DCASE 2023 Challenge.
CoRR, 2023

SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources.
IEEE Access, 2023

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

TimToShape: Supporting Practice of Musical Instruments by Visualizing Timbre with 2D Shapes based on Crossmodal Correspondences.
Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023

Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

HumanDiffusion: diffusion model using perceptual gradients.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts.
Proceedings of the IEEE International Conference on Acoustics, 2023

MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images.
Proceedings of the IEEE International Conference on Acoustics, 2023

jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus.
Proceedings of the IEEE International Conference on Acoustics, 2023

COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Yodas: Youtube-Oriented Dataset for Audio and Speech.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules.
Comput. Speech Lang., 2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection.
CoRR, 2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses.
CoRR, 2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis.
CoRR, 2022

How Should We Evaluate Synthesized Environmental Sounds.
CoRR, 2022

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations.
CoRR, 2022

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation.
CoRR, 2022

VTTS: Visual-Text To Speech.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Robustness of Signal Processing-Based Pseudonymization Method Against Decryption Attack.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Personalized Filled-pause Generation with Group-wise Prediction Models.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model.
IEEE Signal Process. Lett., 2021

Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials.
IEICE Trans. Inf. Syst., 2021

DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching.
IEICE Trans. Inf. Syst., 2021

Noise Robust Acoustic Anomaly Detection System with Nonnegative Matrix Factorization Based on Generalized Gaussian Distribution.
IEICE Trans. Inf. Syst., 2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification.
CoRR, 2021

ESPnet2-TTS: Extending the Edge of TTS Research.
CoRR, 2021

Onoma-to-wave: Environmental sound synthesis from onomatopoeic words.
CoRR, 2021

Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS.
Proceedings of the IEEE International Conference on Acoustics, 2021

Humanacgan: Conditional Generative Adversarial Network with Human-Based Auxiliary Classifier and its Evaluation in Phoneme Perception.
Proceedings of the IEEE International Conference on Acoustics, 2021

Japanese EFL Learners' Speaking Practice Utilizing Text-to-Speech Technology Within a Team-Based Flipped Learning Framework.
Proceedings of the Learning and Collaboration Technologies: New Challenges and Learning Experiences, 2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis.
Speech Commun., 2020

Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks.
Signal Process., 2020

DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope.
IEICE Trans. Inf. Syst., 2020

Generative Moment Matching Network-Based Neural Double-Tracking for Synthesized and Natural Singing Voices.
IEICE Trans. Inf. Syst., 2020

JSSS: free Japanese speech corpus for summarization and simplification.
CoRR, 2020

PJS: phoneme-balanced Japanese singing voice corpus.
CoRR, 2020

JVS-MuSiC: Japanese multispeaker singing-voice corpus.
CoRR, 2020

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Humangan: Generative Adversarial Network With Human-Based Discriminator And Its Evaluation In Speech Perception Modeling.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

PJS: phoneme-balanced Japanese singing-voice corpus.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis.
IEICE Trans. Inf. Syst., 2019

Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra.
Comput. Speech Lang., 2019

Overview of Tasks and Investigation of Subjective Evaluation Methods in Environmental Sound Synthesis and Conversion.
CoRR, 2019

JVS corpus: free Japanese multi-speaker voice corpus.
CoRR, 2019

TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication.
Proceedings of the Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, 2019

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

V2S attack: building DNN-based voice conversion from automatic speaker verification.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Spoken Dialogue Robot for Watching Daily Life of Elderly People.
Proceedings of the Increasing Naturalness and Flexibility in Spoken Dialogue Interaction, 2019

Speech Quality Evaluation of Synthesized Japanese Speech Using EEG.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking.
Proceedings of the IEEE International Conference on Acoustics, 2019

Estimating confidence in voices using crowdsourcing for alleviating tension with altered auditory feedback.
Proceedings of the AsianCHI@CHI 2019: Asian CHI Symposium, 2019

2018
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

An end-to-end model for cross-lingual transformation of paralinguistic information.
Mach. Transl., 2018

CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Phase Reconstruction from Amplitude Spectrograms Based on Von-Mises-Distribution Deep Neural Network.
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Text-to-Speech Synthesis Using STFT Spectra Based on Low-/Multi-Resolution Generative Adversarial Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation.
Proceedings of the 26th European Signal Processing Conference, 2018

Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Data augmentation with moment-matching networks for i-vector based speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Voice Conversion Using Input-to-Output Highway Networks.
IEICE Trans. Inf. Syst., 2017

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis.
CoRR, 2017

Sampling-Based Speech Parameter Generation Using Moment-Matching Networks.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

The UTokyo speech synthesis system for Blizzard Challenge 2017.
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017

Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech synthesis using FFT spectra.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models.
IEICE Trans. Inf. Syst., 2016

Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics.
IEICE Trans. Inf. Syst., 2016

The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015
Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Parameter generation algorithm considering Modulation Spectrum for HMM-based speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

The NAIST Text-to-Speech System for the Blizzard Challenge 2015.
Proceedings of the Blizzard Challenge 2015, 2015

2014
Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014

A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A postfilter to modify the modulation spectrum in HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

Modified post-filter to recover modulation spectrum for HMM-based speech synthesis.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Modulation spectrum-based post-filter for GMM-based Voice Conversion.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
Improvements to HMM-based speech synthesis based on parameter generation with rich context models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Generalizing continuous-space translation of paralinguistic information.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

2012
A method for translation of paralinguistic information.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012


  Loading...