Gustav Eje Henter

Orcid: 0000-0002-1643-1054

Affiliations:
  • KTH Royal Institute of Technology, Stockholm, Sweden


According to our database1, Gustav Eje Henter authored at least 91 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022.
ACM Trans. Graph., June, 2024

CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models.
CoRR, 2024

Towards a GENEA Leaderboard - an Extended, Living Benchmark for Evaluating and Advancing Conversational Motion Synthesis.
CoRR, 2024

Voice Conversion-based Privacy through Adversarial Information Hiding.
CoRR, 2024

HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters.
CoRR, 2024

Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework.
CoRR, 2024

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech.
CoRR, 2024

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis.
CoRR, 2024

Exploring Internal Numeracy in Language Models: A Case Study on ALBERT.
CoRR, 2024

GENEA Workshop 2024: The 5th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents.
Proceedings of the 26th International Conference on Multimodal Interaction, 2024

Matcha-TTS: A Fast TTS Architecture with Conditional Flow Matching.
Proceedings of the IEEE International Conference on Acoustics, 2024

Unified Speech and Gesture Synthesis Using Flow Matching.
Proceedings of the IEEE International Conference on Acoustics, 2024

Fake it to make it: Using synthetic data to remedy the data shortage in joint multi-modal speech-and-gesture synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models.
ACM Trans. Graph., August, 2023

A Comprehensive Review of Data-Driven Co-Speech Gesture Generation.
Comput. Graph. Forum, May, 2023

Context-specific kernel-based hidden Markov model for time series analysis.
CoRR, 2023

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Stuck in the MOS pit: A critical analysis of MOS test methodology in TTS evaluation.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Speaker-independent neural formant synthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

OverFlow: Putting flows on top of neural transducers for better TTS.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GENEA Workshop 2023: The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents.
Proceedings of the 25th International Conference on Multimodal Interaction, 2023

"Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion.
Proceedings of the International Conference on Multimodal Interaction, 2023

The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings.
Proceedings of the 25th International Conference on Multimodal Interaction, 2023

A Processing Framework to Access Large Quantities of Whispered Speech Found in ASMR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Autovocoder: Fast Waveform Generation from a Learned Speech Representation Using Differentiable Digital Signal Processing.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS.
Proceedings of the IEEE International Conference on Acoustics, 2023

Prosody-Controllable Spontaneous TTS with Neural HMMS.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Kernel-based hidden Markov conditional densities.
Comput. Stat. Data Anal., 2022

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation.
Proceedings of the International Conference on Multimodal Interaction, 2022

GENEA Workshop 2022: The 3rd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents.
Proceedings of the International Conference on Multimodal Interaction, 2022

Neural HMMS Are All You Need (For High-Quality Attention-Free TTS).
Proceedings of the IEEE International Conference on Acoustics, 2022

Wavebender GAN: An Architecture for Phonetically Meaningful Speech Manipulation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multimodal Analysis of the Predictability of Hand-gesture Properties.
Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022

2021
Transflower: probabilistic autoregressive dance generation with multimodal attention.
ACM Trans. Graph., 2021

Moving Fast and Slow: Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation.
Int. J. Hum. Comput. Interact., 2021

Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia: Clinical Feasibility and Preliminary Results.
Frontiers Comput. Sci., 2021

Normalizing Flow based Hidden Markov Models for Classification of Speech Phones with Explainability.
CoRR, 2021

Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech.
Proceedings of the IVA '21: ACM International Conference on Intelligent Virtual Agents, 2021

A Large, Crowdsourced Evaluation of Gesture Generation Systems on Common Data: The GENEA Challenge 2020.
Proceedings of the IUI '21: 26th International Conference on Intelligent User Interfaces, 2021

Integrated Speech and Gesture Synthesis.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

GENEA Workshop 2021: The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

HEMVIP: Human Evaluation of Multiple Videos in Parallel.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

Full-Glow: Fully Conditional Glow for More Realistic Image Generation.
Proceedings of the Pattern Recognition - 43rd DAGM German Conference, DAGM GCPR 2021, Bonn, Germany, September 28, 2021

The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
MoGlow: probabilistic and controllable motion synthesis using normalising flows.
ACM Trans. Graph., 2020

Robust model training and generalisation with Studentising flows.
CoRR, 2020

Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows.
Comput. Graph. Forum, 2020

Robust Classification Using Hidden Markov Models and Mixtures of Normalizing Flows.
Proceedings of the 30th IEEE International Workshop on Machine Learning for Signal Processing, 2020

Let's Face It: Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings.
Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Generating coherent spontaneous speech and gesture from text.
Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Gesticulator: A framework for semantically-aware speech-driven gesture generation.
Proceedings of the ICMI '20: International Conference on Multimodal Interaction, 2020

Breathing and Speech Planning in Spontaneous Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model.
CoRR, 2019

Where do the improvements come from in sequence-to-sequence neural TTS?
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Speech Synthesis Evaluation - State-of-the-Art Assessment and Suggestion for a Novel Research Program.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

How to train your fillers: uh and um in spontaneous speech synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Analyzing Input and Output Representations for Speech-Driven Gesture Generation.
Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents, 2019

Spontaneous Conversational Speech Synthesis from Found Data.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Off the Cuff: Exploring Extemporaneous Speech Delivery with TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Casting to Corpus: Segmenting and Selecting Spontaneous Dialogue for Tts with a Cnn-lstm Speaker-dependent Breath Detector.
Proceedings of the IEEE International Conference on Acoustics, 2019

On the Importance of Representations for Speech-Driven Gesture Generation.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

2018
Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis.
Speech Commun., 2018

Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis.
CoRR, 2018

Kernel Density Estimation-Based Markov Models with Hidden State.
CoRR, 2018

Analysing Shortcomings of Statistical Parametric Speech Synthesis.
CoRR, 2018

Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Consensus-based Sequence Training for Video Captioning.
CoRR, 2017

Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Principles for Learning Controllable TTS from Annotated and Latent Variation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Adapting and controlling DNN-based speech synthesis using input codes.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Bayesian Analysis of Phoneme Confusion Matrices.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Minimum Entropy Rate Simplification of Stochastic Processes.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

Median-based generation of synthetic speech durations using a non-parametric approach.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

From HMMS to DNNS: Where do the improvements come from?
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Robust TTS duration modelling using DNNS.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Are we using enough listeners? no! - an empirically-supported critique of interspeech 2014 TTS evaluations.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014
Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A flexible front-end for HTS.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013
Probabilistic Sequence Models with Speech and Language Applications.
PhD thesis, 2013

Maximizing Phoneme Recognition Accuracy for Enhanced Speech Intelligibility in Noise.
IEEE Trans. Speech Audio Process., 2013

Picking up the pieces: Causal states in noisy data, and how to recover them.
Pattern Recognit. Lett., 2013

2012
Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Gaussian process dynamical models for nonparametric speech representation and synthesis.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Intermediate-State HMMs to Capture Continuously-Changing Signal Features.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010
Simplified probability models for generative tasks: A rate-distortion approach.
Proceedings of the 18th European Signal Processing Conference, 2010


  Loading...