Thomas Drugman

Sri Vishnu Kumar Karlapati

CoRR, 2024

2023

A Comparative Analysis of Pretrained Language Models for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Controllable Emphasis with zero data for text-to-speech.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Computer-assisted pronunciation training - Speech synthesis is almost all you need.

[BibT_eX]

[DOI]

Speech Commun., 2022

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody.

[BibT_eX]

[DOI]

CoRR, 2022

Expressive, Variable, and Controllable Duration Modelling in TTS.

[BibT_eX]

[DOI]

CoRR, 2022

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer.

[BibT_eX]

[DOI]

CoRR, 2022

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Expressive, Variable, and Controllable Duration Modelling in TTS.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Distribution Augmentation for Low-Resource Expressive Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

EmoCat: Language-agnostic Emotional Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments.

[BibT_eX]

[DOI]

Alejandro Mottini

Jaime Lorenzo-Trueba

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention.

[BibT_eX]

[DOI]

Daniel Korzekwa

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Learned Conditional Prior for the VAE Acoustic Space of a TTS System.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Camp: A Two-Stage Approach to Modelling Prosody in Context.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Voice Conversion for Whispered Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2020

Maximum Phase Modeling for Sparse Linear Prediction of Speech.

[BibT_eX]

[DOI]

CoRR, 2020

Excitation-based Voice Quality Analysis and Modification.

[BibT_eX]

[DOI]

CoRR, 2020

Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech.

[BibT_eX]

[DOI]

Daniel Sáez-Trigueros

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Singing Synthesis: With a Little Help from my Attention.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

In Other News: a Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data.

[BibT_eX]

[DOI]

Nishant Prateek

Mateusz Lajszczak

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Towards Achieving Robust Universal Neural Vocoding.

[BibT_eX]

[DOI]

Alexis Moinet

Vatsal Aggarwal

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech.

[BibT_eX]

[DOI]

Daniel Korzekwa

Bozena Kostek

Mateusz Lajszczak

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Effect of Data Reduction on Sequence-to-sequence Neural TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Traditional Machine Learning for Pitch Detection.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2018

Effect of data reduction on sequence-to-sequence neural TTS.

[BibT_eX]

[DOI]

CoRR, 2018

Robust universal neural vocoding.

[BibT_eX]

[DOI]

CoRR, 2018

LSTM-Based Whisper Detection.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Comprehensive Evaluation of Statistical Speech Waveform Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Parameter Generation Algorithms for Text-To-Speech Synthesis with Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

2017

Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information.

[BibT_eX]

[DOI]

Thomas Merritt

Antonietta Maria Esposito

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Recent Advances in Nonlinear Speech Processing: Directions and Challenges.

[BibT_eX]

[DOI]

Anna Esposito

Marcos Faúndez-Zanuy

Gennaro Cordasco

Jordi Solé-Casals

Francesco Carlo Morabito

Proceedings of the Recent Advances in Nonlinear Speech Processing, 2016

HMM-Based Speech Segmentation: Improvements of Fully Automatic Approaches.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Voice Activity Detection: Merging Source and Filter-based Information.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2016

Optimizing Speech Recognition Evaluation Using Stratified Sampling.

[BibT_eX]

[DOI]

Janne Pylkkönen

Max Bisani

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models.

[BibT_eX]

[DOI]

Janne Pylkkönen

Reinhard Kneser

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Tracheoesophageal speech: A dedicated objective acoustic assessment.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2015

Non-Linear Speech Processing (NOLISP 2013).

[BibT_eX]

[DOI]

Comput. Speech Lang., 2015

Fast and accurate phase unwrapping.

[BibT_eX]

[DOI]

Yannis Stylianou

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Robust excitation-based features for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Fast Inter-Harmonic Reconstruction for Spectral Envelope Estimation in High-Pitched Voices.

[BibT_eX]

[DOI]

Yannis Stylianou

IEEE Signal Process. Lett., 2014

Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra.

[BibT_eX]

[DOI]

Yannis Stylianou

IEEE Signal Process. Lett., 2014

Automatic Variation of the Degree of Articulation in New HMM-Based Voices.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2014

HMM-based speech synthesis with various degrees of articulation: A perceptual study.

[BibT_eX]

[DOI]

Neurocomputing, 2014

Speech polarity determination: A comparative evaluation.

[BibT_eX]

[DOI]

Neurocomputing, 2014

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Soheil Khorram

Hossein Sameti

Fahimeh Bahmaninezhad

Simon King

EURASIP J. Audio Speech Music. Process., 2014

Analysis and HMM-based synthesis of hypo and hyperarticulated speech.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

Data-driven detection and analysis of the patterns of creaky voice.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

Glottal source processing: From analysis to applications.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

Using mutual information in supervised temporal event detection: Application to cough detection.

[BibT_eX]

[DOI]

Biomed. Signal Process. Control., 2014

Speech synthesis in various communicative situations: impact of pronunciation variations.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Excitation modeling for HMM-based speech synthesis: Breaking down the impact of periodic and aperiodic components.

[BibT_eX]

[DOI]

Tuomo Raitio

Proceedings of the IEEE International Conference on Acoustics, 2014

COVAREP - A collaborative voice analysis repository for speech technologies.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Parametric representation for singing voice synthesis: A comparative evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Objective Study of Sensor Relevance for Automatic Cough Detection.

[BibT_eX]

[DOI]

IEEE J. Biomed. Health Informatics, 2013

Residual Excitation Skewness for Automatic Speech Polarity Detection.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2013

Improved automatic detection of creak.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2013

Detecting Speech Polarity with High-Order Statistics.

[BibT_eX]

[DOI]

Cogn. Comput., 2013

HMM-based speech synthesis of live sports commentaries: integration of a two-layer prosody annotation.

[BibT_eX]

[DOI]

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

On the Importance of Pre-emphasis and Window Shape in Phase-Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Nonlinear Speech Processing - 6th International Conference, 2013

Analysis and Quantification of Acoustic Artefacts in Tracheoesophageal Speech.

[BibT_eX]

[DOI]

Proceedings of the Advances in Nonlinear Speech Processing - 6th International Conference, 2013

HMM-based synthesis of creaky voice.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A new prosody annotation protocol for live sports commentaries.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A quantitative comparison of glottal closure instant estimation algorithms on a large variety of singing sounds.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A new phase-based feature representation for robust speech recognition.

[BibT_eX]

[DOI]

Erfan Loweimi

Seyed Mohammad Ahadi

Proceedings of the IEEE International Conference on Acoustics, 2013

Prediction of creaky voice from contextual factors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

A comparative study of pitch extraction algorithms on a large variety of singing sounds.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

A comparative study of glottal source estimation techniques.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2012

Automatic Phone Alignment - A Comparison between Speaker-Independent Models and Models Trained on the Corpus to Align.

[BibT_eX]

[DOI]

Proceedings of the Advances in Natural Language Processing, 2012

Statistical methods for varying the degree of articulation in new HMM-based voices.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Train&align: A new online tool for automatic phonetic alignment.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Automatic detection and correction of syntax-based prosody annotation errors.

[BibT_eX]

[DOI]

Richard Beaufort

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Reactive and continuous control of HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Left and right-hand guitar playing techniques detection.

[BibT_eX]

[DOI]

Cécile Picard-Limpens

Nicolas Riche

Proceedings of the 12th International Conference on New Interfaces for Musical Expression, 2012

Audio and Contact Microphones for Cough Detection.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Resonator-based creaky voice detection.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Modeling the Creaky Excitation for Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation.

[BibT_eX]

[DOI]

Speech Commun., 2011

Perceptual Effects of the Degree of Articulation in HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Nonlinear Speech Processing, 2011

Oscillating Statistical Moments for Speech Polarity Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Nonlinear Speech Processing, 2011

Continuous Control of the Degree of Articulation in HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics.

[BibT_eX]

[DOI]

Abeer Alwan

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Phase-based information for voice pathology detection.

[BibT_eX]

[DOI]

Thomas Dubuisson

Proceedings of the IEEE International Conference on Acoustics, 2011

Assessment of audio features for automatic cough detection.

[BibT_eX]

[DOI]

Jérôme Urbain

Proceedings of the 19th European Signal Processing Conference, 2011

2010

Analysis and synthesis of hypo- and hyperarticulated speech.

[BibT_eX]

[DOI]

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Glottal-based analysis of the lombard effect.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Chirp complex cepstrum-based decomposition for asynchronous glottal analysis.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

On the potential of glottal signatures for speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A comparative evaluation of pitch modification techniques.

[BibT_eX]

[DOI]

Proceedings of the 18th European Signal Processing Conference, 2010

2009

Glottal Source Estimation Using an Automatic Chirp Decomposition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Nonlinear Speech Processing, 2009

On the mutual information of glottal source estimation techniques for the automatic detection of speech pathologies.

[BibT_eX]

[DOI]

Thomas Dubuisson

Proceedings of the Sixth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2009

A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis.

[BibT_eX]

[DOI]

Geoffrey Wilfart

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

On the mutual information between source and filter contributions for voice pathology detection.

[BibT_eX]

[DOI]

Thomas Dubuisson

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Glottal closure and opening instant detection from speech signals.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Complex cepstrum-based decomposition of speech for glottal source estimation.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Eigenresiduals for improved parametric speech synthesis.

[BibT_eX]

[DOI]

Geoffrey Wilfart

Proceedings of the 17th European Signal Processing Conference, 2009

2008

Glottal Source Estimation Robustness - A Comparison of Sensitivity of Voice Source Estimation Techniques.

[BibT_eX]

Proceedings of the SIGMAP 2008, 2008

Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Multimodal Interfaces, 2008

Voice source parameters estimation by fitting the glottal formant and the inverse filtering open phase.

[BibT_eX]

[DOI]

Proceedings of the 2008 16th European Signal Processing Conference, 2008

2007

Relevant Feature Selection for Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]