Jonas Beskow

Orcid: 0000-0003-1399-6604

According to our database1, Jonas Beskow authored at least 127 papers between 1995 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans.
CoRR, 2024

Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents.
CoRR, 2024

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech.
CoRR, 2024

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis.
CoRR, 2024

Gesture Evaluation in Virtual Reality.
Proceedings of the Companion Proceedings of the 26th International Conference on Multimodal Interaction, 2024

Matcha-TTS: A Fast TTS Architecture with Conditional Flow Matching.
Proceedings of the IEEE International Conference on Acoustics, 2024

Unified Speech and Gesture Synthesis Using Flow Matching.
Proceedings of the IEEE International Conference on Acoustics, 2024

Fake it to make it: Using synthetic data to remedy the data shortage in joint multi-modal speech-and-gesture synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Learning to generate pointing gestures in situated embodied conversational agents.
Frontiers Robotics AI, October, 2023

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models.
ACM Trans. Graph., August, 2023

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Hi robot, it's not what you say, it's how you say it.
Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication, 2023

Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters.
Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents, 2023

OverFlow: Putting flows on top of neural transducers for better TTS.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation.
Proceedings of the 25th International Conference on Multimodal Interaction, 2023

Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters.
Proceedings of the 17th IEEE International Conference on Automatic Face and Gesture Recognition, 2023

2022
Neural HMMS Are All You Need (For High-Quality Attention-Free TTS).
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Transflower: probabilistic autoregressive dance generation with multimodal attention.
ACM Trans. Graph., 2021

Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia: Clinical Feasibility and Preliminary Results.
Frontiers Comput. Sci., 2021

Mechanical Chameleons: Evaluating the effects of a social robot's non-verbal behavior on social influence.
CoRR, 2021

Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Expressive Robot Performance Based on Facial Motion Capture.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Integrated Speech and Gesture Synthesis.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

2020
MoGlow: probabilistic and controllable motion synthesis using normalising flows.
ACM Trans. Graph., 2020

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition.
IEEE Trans. Cogn. Dev. Syst., 2020

Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows.
Comput. Graph. Forum, 2020

Let's Face It: Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings.
Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Can we trust online crowdworkers?: Comparing online and offline participants in a preference test of virtual agents.
Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Generating coherent spontaneous speech and gesture from text.
Proceedings of the IVA '20: ACM International Conference on Intelligent Virtual Agents, 2020

Breathing and Speech Planning in Spontaneous Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Embodiment and gender interact in alignment to TTS voices.
Proceedings of the 42th Annual Meeting of the Cognitive Science Society, 2020

2019
Modeling of Human Visual Attention in Multiparty Open-World Dialogues.
ACM Trans. Hum. Robot Interact., 2019

The effect of a physical robot on vocabulary learning.
CoRR, 2019

Speech Synthesis Evaluation - State-of-the-Art Assessment and Suggestion for a Novel Research Program.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

How to train your fillers: uh and um in spontaneous speech synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

PROMIS: a statistical-parametric speech synthesis system with prominence control via a prominence network.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Spontaneous Conversational Speech Synthesis from Found Data.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Off the Cuff: Exploring Extemporaneous Speech Delivery with TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Equipping social robots with culturally-sensitive facial expressions of emotion using data-driven methods.
Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, 2019

Multimodal conversational interaction with robots.
Proceedings of the Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions, 2019

2018
A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Crowdsourced Multimodal Corpora Collection Tool.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Emotion-Awareness for Intelligent Vehicle Assistants: A Research Agenda.
Proceedings of the 1st IEEE/ACM International Workshop on Software Engineering for AI in Autonomous Systems, 2018

Using Constrained Optimization for Real-Time Synchronization of Verbal and Nonverbal Robot Behavior.
Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

Reverse Engineering Psychologically Valid Facial Expressions of Emotion into Social Robots.
Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018

2017
Mimebot - Investigating the Expressibility of Non-Verbal Communication Across Agent Embodiments.
ACM Trans. Appl. Percept., 2017

Self-Supervised Vision-Based Detection of the Active Speaker as a Prerequisite for Socially-Aware Language Acquisition.
CoRR, 2017

Machine Learning and Social Robotics for Detecting Early Signs of Dementia.
CoRR, 2017

Real-time labeling of non-rigid motion capture marker sets.
Comput. Graph., 2017

Look but Don't Stare: Mutual Gaze Interaction in Social Robots.
Proceedings of the Social Robotics - 9th International Conference, 2017

Moveable Facial Features in a Social Mediator.
Proceedings of the Intelligent Virtual Agents - 17th International Conference, 2017

Crowd-Powered Design of Virtual Attentive Listeners.
Proceedings of the Intelligent Virtual Agents - 17th International Conference, 2017

Crowd-Sourced Design of Artificial Attentive Listeners.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016
A hybrid harmonics-and-bursts modelling approach to speech synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

WikiSpeech - enabling open source text-to-speech for Wikipedia.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Robust online motion capture labeling of finger markers.
Proceedings of the 9th International Conference on Motion in Games, 2016

A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Look who's talking: visual identification of the active speaker in multi-party human-robot interaction.
Proceedings of the 2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction, 2016

Automatic annotation of gestural units in spontaneous face-to-face interaction.
Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, 2016

2015
Towards Fully Automated Motion Capture of Signs - Development and Evaluation of a Key Word Signing Avatar.
ACM Trans. Access. Comput., 2015

Talking Heads, Signing Avatars and Social Robots.
Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015

A Collaborative Human-Robot Game as a Test-bed for Modelling Multi-party, Situated Interaction.
Proceedings of the Intelligent Virtual Agents - 15th International Conference, 2015

Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects.
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015

2014
Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions.
Comput. Speech Lang., 2014

Spontaneous spoken dialogues with the furhat human-like robot head.
Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 2014

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue.
Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 2014

2013
The furhat Back-Projected humanoid Head-Lip Reading, gaze and Multi-Party Interaction.
Int. J. Humanoid Robotics, 2013

Face-to-Face with a Robot: What do we actually Talk about?
Int. J. Humanoid Robotics, 2013

Non-linear Pitch Modification in Voice Conversion Using Artificial Neural Networks.
Proceedings of the Advances in Nonlinear Speech Processing - 6th International Conference, 2013

The furhat social companion talking head.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Tutoring Robots - Multiparty Multimodal Social Dialogue with an Embodied Tutor.
Proceedings of the Innovative and Creative Developments in Multimodal Interaction Systems, 2013

Aspects of co-occurring syllables and head nods in spontaneous dialogue.
Proceedings of the Auditory-Visual Speech Processing, 2013

Co-present or Not?
Proceedings of the Eye Gaze in Intelligent User Interfaces, 2013

2012
Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections.
ACM Trans. Interact. Intell. Syst., 2012

Visual Recognition of Isolated Swedish Sign Language Signs
CoRR, 2012

Children and adults in dialogue with the robot head Furhat - corpus collection and initial analysis.
Proceedings of the Third Workshop on Child, Computer and Interaction, 2012

3rd party observer gaze as a continuous measure of dialogue flow.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Lip-Reading: Furhat Audio Visual Intelligibility of a Back Projected Animated Face.
Proceedings of the Intelligent Virtual Agents - 12th International Conference, 2012

Multimodal multiparty social interaction with the furhat head.
Proceedings of the International Conference on Multimodal Interaction, 2012

2011
The Mona Lisa Gaze Effect as an Objective Metric for Perceived Cospatiality.
Proceedings of the Intelligent Virtual Agents - 11th International Conference, 2011

Furhat: A Back-Projected Human-Like Robot Head for Multiparty Human-Machine Interaction.
Proceedings of the Cognitive Behavioural Systems, 2011

A robotic head using projected animated faces.
Proceedings of the Auditory-Visual Speech Processing, 2011

Kinetic data for large-scale analysis and modeling of face-to-face conversation.
Proceedings of the Auditory-Visual Speech Processing, 2011

2010
Spontal: A Swedish Spontaneous Dialogue Corpus of Audio, Video and Motion Capture.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Prominence detection in Swedish using syllable correlates.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Perception of nonverbal gestures of prominence in visual speech animation.
Proceedings of the ACM / SSPNET 2nd International Symposium on Facial Analysis and Animation, 2010

Perception of gaze direction in 2D and 3D facial projections.
Proceedings of the ACM / SSPNET 2nd International Symposium on Facial Analysis and Animation, 2010

Audio-Visual Prosody: Perception, Detection, and Synthesis of Prominence.
Proceedings of the Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues, 2010

Animated Faces for Robotic Heads: Gaze and Beyond.
Proceedings of the Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues, 2010

2009
Multimodal Interaction Control.
Proceedings of the Computers in the Human Interaction Loop, 2009

Auditory visual prominence.
J. Multimodal User Interfaces, 2009

SynFace - Speech-Driven Facial Animation for Virtual Speech-Reading Support.
EURASIP J. Audio Speech Music. Process., 2009

Virtual speech reading support for hard of hearing in a domestic multi-media setting.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

The MonAMI reminder: a spoken dialogue system for face-to-face interaction.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Face-to-Face Interaction and the KTH Cooking Show.
Proceedings of the Development of Multimodal Interfaces: Active Listening and Synchrony, 2009

Effects of visual prominence cues on speech intelligibility.
Proceedings of the Auditory-Visual Speech Processing, 2009

Synface - verbal and non-verbal face animation from audio.
Proceedings of the Auditory-Visual Speech Processing, 2009

2008
Innovative Interfaces in MonAMI: The Reminder.
Proceedings of the Perception in Multimodal Dialogue Systems, 2008

Hearing at home - communication support in home environments for hearing impaired persons.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Recognizing and modelling regional varieties of Swedish.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Innovative interfaces in MonAMI: the reminder.
Proceedings of the 10th International Conference on Multimodal Interfaces, 2008

2007
Pushy versus meek - using avatars to influence turn-taking behaviour.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Analysis and Synthesis of Multimodal Verbal and Non-verbal Interaction for Animated Interface Agents.
Proceedings of the Verbal and Nonverbal Communication Behaviours, 2007

2006
Visual correlates to prominence in several expressive modes.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

User Evaluation of the SYNFACE Talking Head Telephone.
Proceedings of the Computers Helping People with Special Needs, 2006

2005
Data-driven synthesis of expressive visual speech using an MPEG-4 talking head.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004
Trainable Articulatory Control Models for Visual Speech Synthesis.
Int. J. Speech Technol., 2004

Design strategies for a virtual language tutor.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

SYNFACE - A Talking Head Telephone for the Hearing-Impaired.
Proceedings of the Computers Helping People with Special Needs, 2004

Expressive Animated Agents for Affective Dialogue Systems.
Proceedings of the Affective Dialogue Systems, Tutorial and Research Workshop, 2004

Preliminary Cross-Cultural Evaluation of Expressiveness in Synthetic Faces.
Proceedings of the Affective Dialogue Systems, Tutorial and Research Workshop, 2004

2003
Resynthesis of 3d tongue movements from facial data.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002
Specification and realisation of multimodal output in dialogue systems.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001
Timing and interaction of visual cues for prominence in audiovisual speech perception.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000
Wavesurfer - an open source speech tool.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Adapt - a multimodal conversational dialogue system in an apartment domain.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999
Picture my voice: Audio to visual speech synthesis using artificial neural networks.
Proceedings of the Auditory-Visual Speech Processing, 1999

Developing a 3D-agent for the august dialogue system.
Proceedings of the Auditory-Visual Speech Processing, 1999

Synthetic visual speech driven from auditory speech.
Proceedings of the Auditory-Visual Speech Processing, 1999

1998
Web-based educational tools for speech technology.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Synthetic faces as a lipreading support.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Recent Developments In Facial Animation: An Inside View.
Proceedings of the Auditory-Visual Speech Processing, 1998

1997
OLGA - a dialogue system with an animated talking agent.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

The teleface project multi-modal speech-communication for the hearing impaired.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Animation of talking agents.
Proceedings of the ESCA Workshop on Audio-Visual Speech Processing, 1997

1995
Rule-based visual speech synthesis.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995


  Loading...