Jaime Lorenzo-Trueba

Orcid: 0000-0003-0459-1429

According to our database1, Jaime Lorenzo-Trueba authored at least 51 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.




In proceedings 
PhD thesis 


On csauthors.net:


Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations.
CoRR, 2024

Investigating Self-Supervised Features for Expressive, Multilingual Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024

Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multilingual context-based pronunciation learning for Text-to-Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Computer-assisted pronunciation training - Speech synthesis is almost all you need.
Speech Commun., 2022

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Cross-Speaker Style Transfer for Text-to-Speech Using Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Voice Filter: Few-Shot Text-to-Speech Speaker Adaptation Using Voice Conversion as a Post-Processing Module.
Proceedings of the IEEE International Conference on Acoustics, 2022

EmoCat: Language-agnostic Emotional Voice Conversion.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Enhancing audio quality for expressive Neural Text-to-Speech.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, 2021

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Learned Conditional Prior for the VAE Acoustic Space of a TTS System.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2021

Low-Resource Expressive Text-To-Speech Using Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2021

Camp: A Two-Stage Approach to Modelling Prosody in Context.
Proceedings of the IEEE International Conference on Acoustics, 2021

Voice Conversion for Whispered Speech Synthesis.
IEEE Signal Process. Lett., 2020

Parallel WaveNet conditioned on VAE latent vectors.
CoRR, 2020

Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model.
CoRR, 2019

In Other News: a Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Towards Achieving Robust Universal Neural Vocoding.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Effect of Data Reduction on Sequence-to-sequence Neural TTS.
Proceedings of the IEEE International Conference on Acoustics, 2019

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis.
Speech Commun., 2018

Effect of data reduction on sequence-to-sequence neural TTS.
CoRR, 2018

Robust universal neural vocoding.
CoRR, 2018

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Expressive Speech Synthesis Using Sentiment Embeddings.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Principles for Learning Controllable TTS from Annotated and Latent Variation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Segmenting human activities based on HMMs using smartphone inertial sensors.
Pervasive Mob. Comput., 2016

Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM.
Proceedings of the COLING 2016, 2016

Emotion transplantation through adaptation in HMM-based speech synthesis.
Comput. Speech Lang., 2015

Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation.
Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia, 2014

Towards Cross-Lingual Emotion Transplantation.
Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2014

I <i>Feel</i> You: The Design and Evaluation of a Domotic Affect-Sensitive Spoken Conversational Agent.
Sensors, 2013

Towards speaking style transplantation in speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

NEMOHIFI: an affective HiFi agent.
Proceedings of the 2013 International Conference on Multimodal Interaction, 2013

Sentence selection for improving the tuning process of a statistical machine translation system.
Proces. del Leng. Natural, 2012

Towards an Unsupervised Speaking Style Voice Building Framework: Multi-Style Speaker Diarization.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Towards Glottal Source Controllability in Expressive Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
