Thomas Hueber

According to our database1, Thomas Hueber authored at least 58 papers between 2007 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model.
CoRR, January, 2025

Simulating Articulatory Trajectories with Phonological Feature Interpolation.
CoRR, 2024

Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting.
CoRR, 2024

Mediapi-RGB: Enabling Technological Breakthroughs in French Sign Language (LSF) Research Through an Extensive Video-Text Corpus.
Proceedings of the 19th International Joint Conference on Computer Vision, 2024

Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multistream Neural Architectures for Cued Speech Recognition Using a Pre-Trained Visual Feature Extractor and Constrained CTC Decoding.
Proceedings of the IEEE International Conference on Acoustics, 2022

Repeat after Me: Self-Supervised Learning of Acoustic-to-Articulatory Mapping by Vocal Imitation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Make That Sound More Metallic: Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder.
Trans. Int. Soc. Music. Inf. Retr., 2021

Dynamical Variational Autoencoders: A Comprehensive Review.
Found. Trends Mach. Learn., 2021

Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Learning Robust Speech Representation with an Articulatory-Regularized Variational Autoencoder.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Benchmark of Dynamical Variational Autoencoders Applied to Speech Spectrogram Modeling.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Evaluating the Potential Gain of Auditory and Audiovisual Speech-Predictive Coding Using Deep Learning.
Neural Comput., 2020

What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Autoencoders for music sound synthesis: a comparison of linear, shallow, deep and variational models.
CoRR, 2018

Visual Recognition of Continuous Cued Speech Using a Tandem CNN-HMM Approach.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Biosignal-Based Spoken Communication: A Survey.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Introduction to the Special Issue on Biosignal-Based Spoken Communication.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Automatic animation of an articulatory tongue model from ultrasound images of the vocal tract.
Speech Commun., 2017

Inside Speech: Multisensory and Modality-specific Processing of Tongue and Lip Speech Actions.
J. Cogn. Neurosci., 2017

Feature extraction using multimodal convolutional neural networks for visual speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Adaptation of a Gaussian Mixture Regressor to a New Input Distribution: Extending the C-GMR Framework.
Proceedings of the Latent Variable Analysis and Signal Separation, 2017

Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces.
PLoS Comput. Biol., 2016

Statistical conversion of silent articulation into audible speech using full-covariance HMM.
Comput. Speech Lang., 2016

Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

HMM training strategy for incremental speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Tongue tracking in ultrasound images using eigentongue decomposition and artificial neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Real-time control of a DNN-based articulatory synthesizer for silent speech conversion: a pilot study.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Robust articulatory speech synthesis using deep neural networks for BCI applications.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Vizart3d - real-time system of visual articulatory feedback.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Speaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Ultraspeech-player: intuitive visualization of ultrasound articulatory data for speech therapy and pronunciation training.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data.
Proceedings of the Innovative and Creative Developments in Multimodal Interaction Systems, 2013

Vocal tract imaging system for post-laryngectomy voice replacement.
Proceedings of the IEEE International Instrumentation and Measurement Technology Conference, 2013

The sight of your tongue: neural correlates of audio-lingual speech perception.
Proceedings of the Auditory-Visual Speech Processing, 2013

Audio-visual speaker conversion using prosody features.
Proceedings of the Auditory-Visual Speech Processing, 2013

Vizart3D : Retour Articulatoire Visuel pour l'Aide à la Prononciation (Vizart3D: Visual Articulatory Feedack for Computer-Assisted Pronunciation Training) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012

Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Continuous Articulatory-to-Acoustic Mapping using Phone-based Trajectory HMM for a Silent Speech Interface.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Toward a Multi-Speaker Visual Articulatory Feedback System.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Tests of an Interactive, Phrasebook-style Post-laryngectomy Voice-replacement System.
Proceedings of the 17th International Congress of Phonetic Sciences, 2011

A Visual Speech Recognition System for an Ultrasound-based Silent Speech Interface.
Proceedings of the 17th International Congress of Phonetic Sciences, 2011

Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips.
Speech Commun., 2010

Silent speech interfaces.
Speech Commun., 2010

Silent vs vocalized articulation for a portable ultrasound-based silent speech interface.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Phone recognition from ultrasound and optical video sequences for a silent speech interface.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Some Experiments in Audio-Visual Speech Processing.
Proceedings of the Advances in Nonlinear Speech Processing, 2007

Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Eigentongue Feature Extraction for an Ultrasound-Based Silent Speech Interface.
Proceedings of the IEEE International Conference on Acoustics, 2007
