Thomas Drugman

Orcid: 0000-0002-1491-7878

According to our database1, Thomas Drugman authored at least 111 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data.
CoRR, 2024

2023
A Comparative Analysis of Pretrained Language Models for Text-to-Speech.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Controllable Emphasis with zero data for text-to-speech.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022
Computer-assisted pronunciation training - Speech synthesis is almost all you need.
Speech Commun., 2022

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody.
CoRR, 2022

Expressive, Variable, and Controllable Duration Modelling in TTS.
CoRR, 2022

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer.
CoRR, 2022

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Expressive, Variable, and Controllable Duration Modelling in TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Distribution Augmentation for Low-Resource Expressive Text-To-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
EmoCat: Language-agnostic Emotional Voice Conversion.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Learned Conditional Prior for the VAE Acoustic Space of a TTS System.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2021

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2021

Camp: A Two-Stage Approach to Modelling Prosody in Context.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Voice Conversion for Whispered Speech Synthesis.
IEEE Signal Process. Lett., 2020

Maximum Phase Modeling for Sparse Linear Prediction of Speech.
CoRR, 2020

Excitation-based Voice Quality Analysis and Modification.
CoRR, 2020

Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Singing Synthesis: With a Little Help from my Attention.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
In Other News: a Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Towards Achieving Robust Universal Neural Vocoding.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Effect of Data Reduction on Sequence-to-sequence Neural TTS.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Traditional Machine Learning for Pitch Detection.
IEEE Signal Process. Lett., 2018

Effect of data reduction on sequence-to-sequence neural TTS.
CoRR, 2018

Robust universal neural vocoding.
CoRR, 2018

LSTM-Based Whisper Detection.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Comprehensive Evaluation of Statistical Speech Waveform Synthesis.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Parameter Generation Algorithms for Text-To-Speech Synthesis with Recurrent Neural Networks.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

2017
Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016
Recent Advances in Nonlinear Speech Processing: Directions and Challenges.
Proceedings of the Recent Advances in Nonlinear Speech Processing, 2016

HMM-Based Speech Segmentation: Improvements of Fully Automatic Approaches.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Voice Activity Detection: Merging Source and Filter-based Information.
IEEE Signal Process. Lett., 2016

Optimizing Speech Recognition Evaluation Using Stratified Sampling.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015
Tracheoesophageal speech: A dedicated objective acoustic assessment.
Comput. Speech Lang., 2015

Non-Linear Speech Processing (NOLISP 2013).
Comput. Speech Lang., 2015

Fast and accurate phase unwrapping.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Robust excitation-based features for Automatic Speech Recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Fast Inter-Harmonic Reconstruction for Spectral Envelope Estimation in High-Pitched Voices.
IEEE Signal Process. Lett., 2014

Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra.
IEEE Signal Process. Lett., 2014

Automatic Variation of the Degree of Articulation in New HMM-Based Voices.
IEEE J. Sel. Top. Signal Process., 2014

HMM-based speech synthesis with various degrees of articulation: A perceptual study.
Neurocomputing, 2014

Speech polarity determination: A comparative evaluation.
Neurocomputing, 2014

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis.
EURASIP J. Audio Speech Music. Process., 2014

Analysis and HMM-based synthesis of hypo and hyperarticulated speech.
Comput. Speech Lang., 2014

Data-driven detection and analysis of the patterns of creaky voice.
Comput. Speech Lang., 2014

Glottal source processing: From analysis to applications.
Comput. Speech Lang., 2014

Using mutual information in supervised temporal event detection: Application to cough detection.
Biomed. Signal Process. Control., 2014

Speech synthesis in various communicative situations: impact of pronunciation variations.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Excitation modeling for HMM-based speech synthesis: Breaking down the impact of periodic and aperiodic components.
Proceedings of the IEEE International Conference on Acoustics, 2014

COVAREP - A collaborative voice analysis repository for speech technologies.
Proceedings of the IEEE International Conference on Acoustics, 2014

Parametric representation for singing voice synthesis: A comparative evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Objective Study of Sensor Relevance for Automatic Cough Detection.
IEEE J. Biomed. Health Informatics, 2013

Residual Excitation Skewness for Automatic Speech Polarity Detection.
IEEE Signal Process. Lett., 2013

Improved automatic detection of creak.
Comput. Speech Lang., 2013

Detecting Speech Polarity with High-Order Statistics.
Cogn. Comput., 2013

HMM-based speech synthesis of live sports commentaries: integration of a two-layer prosody annotation.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

On the Importance of Pre-emphasis and Window Shape in Phase-Based Speech Recognition.
Proceedings of the Advances in Nonlinear Speech Processing - 6th International Conference, 2013

Analysis and Quantification of Acoustic Artefacts in Tracheoesophageal Speech.
Proceedings of the Advances in Nonlinear Speech Processing - 6th International Conference, 2013

HMM-based synthesis of creaky voice.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A new prosody annotation protocol for live sports commentaries.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A quantitative comparison of glottal closure instant estimation algorithms on a large variety of singing sounds.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A new phase-based feature representation for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Prediction of creaky voice from contextual factors.
Proceedings of the IEEE International Conference on Acoustics, 2013

A comparative study of pitch extraction algorithms on a large variety of singing sounds.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review.
IEEE Trans. Speech Audio Process., 2012

The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications.
IEEE Trans. Speech Audio Process., 2012

A comparative study of glottal source estimation techniques.
Comput. Speech Lang., 2012

Automatic Phone Alignment - A Comparison between Speaker-Independent Models and Models Trained on the Corpus to Align.
Proceedings of the Advances in Natural Language Processing, 2012

Statistical methods for varying the degree of articulation in new HMM-based voices.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Train&align: A new online tool for automatic phonetic alignment.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Automatic detection and correction of syntax-based prosody annotation errors.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Reactive and continuous control of HMM-based speech synthesis.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Left and right-hand guitar playing techniques detection.
Proceedings of the 12th International Conference on New Interfaces for Musical Expression, 2012

Audio and Contact Microphones for Cough Detection.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Resonator-based creaky voice detection.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Modeling the Creaky Excitation for Parametric Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011
Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation.
Speech Commun., 2011

Perceptual Effects of the Degree of Articulation in HMM-Based Speech Synthesis.
Proceedings of the Advances in Nonlinear Speech Processing, 2011

Oscillating Statistical Moments for Speech Polarity Detection.
Proceedings of the Advances in Nonlinear Speech Processing, 2011

Continuous Control of the Degree of Articulation in HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Phase-based information for voice pathology detection.
Proceedings of the IEEE International Conference on Acoustics, 2011

Assessment of audio features for automatic cough detection.
Proceedings of the 19th European Signal Processing Conference, 2011

2010
Analysis and synthesis of hypo- and hyperarticulated speech.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Glottal-based analysis of the lombard effect.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Chirp complex cepstrum-based decomposition for asynchronous glottal analysis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

On the potential of glottal signatures for speaker recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A comparative evaluation of pitch modification techniques.
Proceedings of the 18th European Signal Processing Conference, 2010

2009
Glottal Source Estimation Using an Automatic Chirp Decomposition.
Proceedings of the Advances in Nonlinear Speech Processing, 2009

On the mutual information of glottal source estimation techniques for the automatic detection of speech pathologies.
Proceedings of the Sixth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2009

A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

On the mutual information between source and filter contributions for voice pathology detection.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Glottal closure and opening instant detection from speech signals.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Complex cepstrum-based decomposition of speech for glottal source estimation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2009

Eigenresiduals for improved parametric speech synthesis.
Proceedings of the 17th European Signal Processing Conference, 2009

2008
Glottal Source Estimation Robustness - A Comparison of Sensitivity of Voice Source Estimation Techniques.
Proceedings of the SIGMAP 2008, 2008

Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition.
Proceedings of the 10th International Conference on Multimodal Interfaces, 2008

Voice source parameters estimation by fitting the glottal formant and the inverse filtering open phase.
Proceedings of the 2008 16th European Signal Processing Conference, 2008

2007
Relevant Feature Selection for Audio-Visual Speech Recognition.
Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, 2007


  Loading...