Korin Richmond

Comput. Speech Lang., 2025

2024

ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Revisiting Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations.

[BibT_eX]

[DOI]

CoRR, 2024

Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models.

[BibT_eX]

[DOI]

CoRR, 2024

Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning.

[BibT_eX]

[DOI]

Siqi Sun

CoRR, 2024

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation.

[BibT_eX]

[DOI]

CoRR, 2024

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Improving Seq2Seq TTS Frontends With Transcribed Speech Audio.

[BibT_eX]

[DOI]

Siqi Sun

Hao Tang

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Recovering Discrete Prosody Inputs via Invert-Classify.

[BibT_eX]

[DOI]

Nicholas Sanders

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

A Low-Resource Pipeline for Text-to-Speech from Found Data With Application to Scottish Gaelic.

[BibT_eX]

[DOI]

William Lamb

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Silent Speech Recognition with Articulator Positions Estimated from Tongue Ultrasound and Lip Video.

[BibT_eX]

[DOI]

Rachel Beeson

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Invert-Classify: Recovering Discrete Prosody Inputs for Text-To-Speech.

[BibT_eX]

[DOI]

Nicholas Sanders

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Phonetic Analysis of Self-supervised Representations of English Speech.

[BibT_eX]

[DOI]

Hao Tang

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Voice Puppetry with FastPitch.

[BibT_eX]

[DOI]

Emelie Van De Vreken

Cassia Valentini-Botinhao

Catherine Lai

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization.

[BibT_eX]

[DOI]

Aidan Pine

Nathan Thanyehténhas Brinklow

Patrick Littell

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors.

[BibT_eX]

[DOI]

Speech Commun., 2021

Automatic audiovisual synchronisation for ultrasound tongue imaging.

[BibT_eX]

[DOI]

Speech Commun., 2021

Cross-lingual Transfer of Phonological Features for Low-resource Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Liaison and Pronunciation Learning in End-to-End Text-to-Speech in French.

[BibT_eX]

[DOI]

Sébastien Le Maguer

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Tal: A Synchronised Multi-Speaker Corpus of Ultrasound Tongue Imaging, Audio, and Lip Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Confidence Intervals for ASR-Based TTS Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Silent versus Modal Multi-Speaker Speech Recognition from Ultrasound and Video.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Enhancing Sequence-to-Sequence Text-to-Speech with Morphology.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker-Independent Mel-Cepstrum Estimation from Articulator Movements Using D-Vector Input.

[BibT_eX]

[DOI]

Kouichi Katsurada

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

A Comparison of Letters and Phones as Input to Sequence-to-Sequence Models for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Analysis of Pronunciation Learning in End-to-End Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Ultrasound Tongue Imaging for Diarization and Alignment of Child Speech Therapy Sessions.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Synchronising Audio and Ultrasound by Learning Cross-Modal Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker-independent Classification of Phonetic Segments from Raw Ultrasound in Child Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Attentive Filtering Networks for Audio Replay Attack Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

A multilinear tongue model derived from speech related MRI data of the human vocal tract.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2018

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Dual-modality Talking-metrics: 3D Visual-Audio Integrated Behaviometric Cues from Speakers.

[BibT_eX]

[DOI]

Jie Zhang

Robert B. Fisher

Proceedings of the 24th International Conference on Pattern Recognition, 2018

2016

Smooth talking: Articulatory join costs for unit selection.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Initial investigation of speech synthesis based on complex-valued neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis.

[BibT_eX]

[DOI]

Rasmus Dall

Sandrine Brognaux

Cassia Valentini-Botinhao

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Tongue Mesh Extraction from 3D MRI Data of the Human Vocal Tract.

[BibT_eX]

[DOI]

Proceedings of the Perspectives in Shape Analysis, 2016

2015

Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract.

[BibT_eX]

[DOI]

Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Glottal Spectral Separation for Speech Synthesis.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2014

An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A fixed dimension and perceptually based dynamic sinusoidal model of speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Articulatory Control of HMM-Based Parametric Speech Synthesis Using Feature-Space-Switched Multiple Regression.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Recording speech articulation in dialogue: Evaluating a synchronized double electromagnetic articulography setup.

[BibT_eX]

[DOI]

J. Phonetics, 2013

An experimental comparison of multiple vocoder types.

[BibT_eX]

[DOI]

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Mage - HMM-based speech synthesis reactively controlled by the articulators.

[BibT_eX]

[DOI]

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Mage - reactive articulatory feature control of HMM-based parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

The edinburgh speech production facility doubletalk corpus.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

On the evaluation of inversion mapping performance in the acoustic domain.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Speech animation using electromagnetic articulography as motion capture data.

[BibT_eX]

[DOI]

Ingmar Steiner

Slim Ouni

Proceedings of the Auditory-Visual Speech Processing, 2013

2012

Deep Architectures for Articulatory Inversion.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Ultrax: An Animated Midsagittal Vocal Tract Display for Speech Therapy.

[BibT_eX]

[DOI]

Steve Renals

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis.

[BibT_eX]

[DOI]

Ingmar Steiner

Slim Ouni

Proceedings of the Facial Analysis and Animation 2012, 2012

2011

Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus.

[BibT_eX]

[DOI]

Phil Hoole

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Formant-Controlled HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

HMM-based speech synthesiser using the LF-model of the glottal source.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

An Analysis of HMM-based prediction of articulatory movements.

[BibT_eX]

[DOI]

Speech Commun., 2010

An HMM-based speech synthesiser using glottal post-filtering.

[BibT_eX]

[DOI]

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Lip synchronization by acoustic inversion.

[BibT_eX]

[DOI]

Gregor Hofer

Michael Berger

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2010

On generating combilex pronunciations via morphological analysis.

[BibT_eX]

[DOI]

Susan Fitt

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

HMM-based text-to-articulatory-movement prediction and analysis of critical articulators.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Comparison of HMM and TMDN methods for lip synchronisation.

[BibT_eX]

[DOI]

Gregor Hofer

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database.

[BibT_eX]

[DOI]

Ricardo Gutierrez-Osuna

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2009

Towards unsupervised articulatory resynthesis of German utterances using EMA data.

[BibT_eX]

[DOI]

Ingmar Steiner

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Robust LTS rules with the Combilex speech technology lexicon.

[BibT_eX]

[DOI]

Susan Fitt

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Preliminary inversion mapping results with a new EMA corpus.

[BibT_eX]

[DOI]

Miguel Á. Carreira-Perpiñán

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2008

Predicting tongue shapes from a few landmark locations.

[BibT_eX]

[DOI]

Chao Qin

Alan Wrench

Steve Renals

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Glottal spectral separation for parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

Multisyn: Open-domain unit selection for the Festival speech synthesis system.

[BibT_eX]

[DOI]

Speech Commun., 2007

Towards an improved modeling of the glottal source in statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Trajectory Mixture Density Networks with Multiple Mixtures for Acoustic-Articulatory Inversion.

[BibT_eX]

[DOI]

Proceedings of the Advances in Nonlinear Speech Processing, 2007

A multitask learning perspective on acoustic-articulatory inversion.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Festival <i>multisyn</i> voices for the 2007 Blizzard Challenge.

[BibT_eX]

[DOI]

Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

2006

A trajectory mixture density network for the acoustic-articulatory inversion mapping.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Redundancy and productivity in the speech technology lexicon - can we do better?

[BibT_eX]

[DOI]

Susan Fitt

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Multisyn Voice for the Blizzard Challenge 2006.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006

2005

Informed blending of databases for emotional speech synthesis.

[BibT_eX]

[DOI]

Gregor Hofer

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Multisyn voices from ARCTIC data for the blizzard challenge.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

Festival 2 - build your own general purpose unit selection speech synthesiser.

[BibT_eX]

[DOI]

Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Acoustic Features for Profiling Mobile Users of Conversational Interfaces.

[BibT_eX]

[DOI]

Dave Toney

David Feinberg

Proceedings of the Mobile Human-Computer Interaction, 2004

2003

Modelling the uncertainty in recovering articulation from acoustics.

[BibT_eX]

[DOI]

Paul Taylor

Comput. Speech Lang., 2003

2000

Continuous speech recognition using articulatory data.

[BibT_eX]

[DOI]

Alan Wrench

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999

Estimating velum height from acoustics during continuous speech.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1997

Detecting Subject Boundaries Within Text: A Language Independent Statistical Approach.

[BibT_eX]

[DOI]