Stéphane Dupont

Orcid: 0000-0003-3674-6747

According to our database1, Stéphane Dupont authored at least 120 papers between 1996 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.




In proceedings 
PhD thesis 


Online presence:



Gesture retrieval and its application to the study of multimodal communication.
Int. J. Digit. Libr., December, 2024

Influence of image encoders and image features transformations in emergent communication.
Proceedings of the 32nd European Symposium on Artificial Neural Networks, 2024

Deep learning in medical image registration: introduction and survey.
CoRR, 2023

A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with Batch Normalization and Knowledge Distillation.
CoRR, 2023

Multimodal Attentive Fusion Network for audio-visual event recognition.
Inf. Fusion, 2022

Transformers and CNNs both Beat Humans on SBIR.
CoRR, 2022

Analysis of Co-Laughter Gesture Relationship on RGB videos in Dyadic Conversation Contex.
CoRR, 2022

Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation.
CoRR, 2022

Soccer captioning: dataset, transformer-based model, and triple-level evaluation.
Proceedings of the 13th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2022) / The 12th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH-2022), 2022

The prediction of residential building consumption using profiling and time encoding.
Proceedings of the 13th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2022) / The 12th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH-2022), 2022

Towards Human Performance on Sketch-Based Image Retrieval.
Proceedings of the CBMI 2022: International Conference on Content-based Multimedia Indexing, Graz, Austria, September 14, 2022

Multi-level Attention Fusion Network for Audio-visual Event Recognition.
CoRR, 2021

Gesture of Interest: Gesture Search for Multi-Person, Multi-Perspective TV Footage.
Proceedings of the 18th International Conference on Content-Based Multimedia Indexing, 2021

Hybrid-task learning for robust automatic speech recognition.
Comput. Speech Lang., 2020

AVECL-UMONS database for audio-visual event classification and localization.
CoRR, 2020

Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition.
CoRR, 2020

A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis.
CoRR, 2020

Intra and Inter-modality Interactions for Audio-visual Event Detection.
Proceedings of the HuMA'20: Proceedings of the 1st International Workshop on Human-centric Multimedia Analysis, 2020

Are You Watching Closely? Content-based Retrieval of Hand Gestures.
Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

SECL-UMons Database for Sound Event Classification and Localization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improved Soccer Action Spotting using both Audio and Video Streams.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Can adversarial training learn image captioning ?
CoRR, 2019

Modulated Self-attention Convolutional Network for VQA.
CoRR, 2019

Adversarial reconstruction for Multi-modal Machine Translation.
CoRR, 2019

Audio-Visual Fusion And Conditioning With Neural Networks For Event Recognition.
Proceedings of the 29th IEEE International Workshop on Machine Learning for Signal Processing, 2019

A Multimodal Approach for the Safeguarding and Transmission of Intangible Cultural Heritage: The Case of i-Treasures.
IEEE Intell. Syst., 2018

Object-oriented Targets for Visual Navigation using Rich Semantic Representations.
CoRR, 2018

Bringing back simplicity and lightliness into neural image captioning.
CoRR, 2018

UMONS Submission for WMT18 Multimodal Translation Task.
CoRR, 2018

Investigating a Hybrid Learning Approach for Robust Automatic Speech Recognition.
Proceedings of the Statistical Language and Speech Processing, 2018

A Dyadic Conversation Dataset on Moral Emotions.
Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018

Multifaceted Engagement in Social Interaction with a Machine: The JOKER Project.
Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018

Blind Speech Separation and Enhancement With GCC-NMF.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

DeepSketch 3 - Analyzing deep neural networks features for better sketch recognition and sketch-based image retrieval.
Multim. Tools Appl., 2017

Modulating and attending the source image during encoding improves Multimodal Translation.
CoRR, 2017

Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation.
CoRR, 2017

Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation.
CoRR, 2017

Amused speech components analysis and classification: Towards an amusement arousal level assessment system.
Comput. Electr. Eng., 2017

Noise and Speech Estimation as Auxiliary Tasks for Robust Speech Recognition.
Proceedings of the Statistical Language and Speech Processing, 2017

Introducing AmuS: The Amused Speech Database.
Proceedings of the Statistical Language and Speech Processing, 2017

Enhanced Retrieval and Browsing in the IMOTION System.
Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017

Quadruplet Networks for Sketch-Based Image Retrieval.
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

UMONS @ MediaEval 2017: Diverse Social Images Retrieval.
Proceedings of the Working Notes Proceedings of the MediaEval 2017 Workshop co-located with the Conference and Labs of the Evaluation Forum (CLEF 2017), 2017

Investigating the impact of the training data volume for robust speech recognition using multi-task learning.
Proceedings of the 2017 IEEE International Symposium on Signal Processing and Information Technology, 2017

A corpus for experimental study of affect bursts in human-robot interaction.
Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents, 2017

Triplet Networks Feature Masking for Sketch-Based Image Retrieval.
Proceedings of the Image Analysis and Recognition - 14th International Conference, 2017

Towards Good Practices for Image Retrieval Based on CNN Features.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

An empirical study on the effectiveness of images in Multimodal Neural Machine Translation.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Laughter Research: A Review of the ILHAIRE Project.
Proceedings of the Toward Robotic Socially Believable Behaving Systems - Volume I, 2016

The IMOTION System at TRECVID 2016: The Ad-Hoc Video Search Task.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

iAutoMotion - an Autonomous Content-Based Video Retrieval Engine.
Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

IMOTION - Searching for Video Sequences Using Multi-Shot Sketch Queries.
Proceedings of the MultiMedia Modeling - 22nd International Conference, 2016

DeepSketch2Image: Deep Convolutional Neural Networks for Partial Sketch Recognition and Image Retrieval.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Semantic Sketch-Based Video Retrieval with Autocompletion.
Proceedings of the Companion Publication of the 21st International Conference on Intelligent User Interfaces, 2016

Speaker-aware Multi-Task Learning for automatic speech recognition.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Towards a listening agent: a system generating audiovisual laughs and smiles to show interest.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

Speaker-aware long short-term memory multi-task learning for speech recognition.
Proceedings of the 24th European Signal Processing Conference, 2016

Audio affect burst synthesis: A multilevel synthesis system for emotional expressions.
Proceedings of the 24th European Signal Processing Conference, 2016

Multi-task learning for speech recognition: an overview.
Proceedings of the 24th European Symposium on Artificial Neural Networks, 2016

DeepSketch 2: Deep convolutional neural networks for partial sketch recognition.
Proceedings of the 14th International Workshop on Content-Based Multimedia Indexing, 2016


A Novel Human Interaction Game-like Application to Learn, Perform and Evaluate Modern Contemporary Singing - "Human Beat Box".
Proceedings of the VISAPP 2015, 2015

IMOTION - A Content-Based Video Retrieval Engine.
Proceedings of the MultiMedia Modeling - 21st International Conference, 2015

UMons at MediaEval 2015 Affective Impact of Movies Task including Violent Scenes Detection.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

An HMM approach for synthesizing amused speech with a controllable intensity of smile.
Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, 2015

Towards a level assessment system of amusement in speech signals: Amused speech components classification.
Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, 2015

Analysis and automatic recognition of Human BeatBox sounds: A comparative study.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speech-laughs: An HMM-based approach for amused speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Shaking and speech-smile vowels classification: An attempt at amusement arousal estimation from speech signals.
Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing, 2015

An HMM-based speech-smile synthesis system: An approach for amusement synthesis.
Proceedings of the 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2015

Breath and repeat: An attempt at enhancing speech-laugh synthesis quality.
Proceedings of the 23rd European Signal Processing Conference, 2015

DeepSketch: Deep convolutional neural networks for sketch recognition and similarity search.
Proceedings of the 13th International Workshop on Content-Based Multimedia Indexing, 2015

Investigating sparse deep neural networks for speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Multimodal data collection of human-robot humorous interactions in the Joker project.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

Arousal-Driven Synthesis of Laughter.
IEEE J. Sel. Top. Signal Process., 2014

Tangible needle, digital haystack: tangible interfaces for reusing media content organized by similarity.
Proceedings of the Eighth International Conference on Tangible, 2014

Scenarizing CADastre Exquisse: A Crossover between Snoezeling in Hospitals/Domes, and Authoring/Experiencing Soundful Comic Strips.
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

A Proximity Grid Optimization Method to Improve Audio Search for Sound Design.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

AudioMetro: directing search for sound designers through content-based cues.
Proceedings of the Audio Mostly 2014, AM '14, 2014

VideoCycle: User-Friendly Navigation by Similarity in Video Databases.
Proceedings of the Advances in Multimedia Modeling, 19th International Conference, 2013

Improved Audio Classification Using a Novel Non-Linear Dimensionality Reduction Ensemble Approach.
Proceedings of the 14th International Society for Music Information Retrieval Conference, 2013

EGT: Enriched Guitar Transcription.
Proceedings of the Intelligent Technologies for Interactive Entertainment, 2013

MashtaCycle: On-Stage Improvised Audio Collage by Content-Based Similarity and Gesture Recognition.
Proceedings of the Intelligent Technologies for Interactive Entertainment, 2013

Nonlinear dimensionality reduction approaches applied to music and textural sounds.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

Laugh-aware virtual agent and its impact on user amusement.
Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2013

Left and right-hand guitar playing techniques detection.
Proceedings of the 12th International Conference on New Interfaces for Musical Expression, 2012

LoopJam: turning the dance floor into a collaborative instrumental map.
Proceedings of the 12th International Conference on New Interfaces for Musical Expression, 2012

Browsing a dance video collection: dance analysis and interface design.
J. Multimodal User Interfaces, 2010

DeviceCycle: Rapid and Reusable Prototyping of Gestural Interfaces, Applied to Audio Browsing by Similarity.
Proceedings of the 10th International Conference on New Interfaces for Musical Expression, 2010

An interactive installation for browsing a dance video database.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

AudioCycle: A similarity-based visualization of musical libraries.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

AudioCycle: Browsing Musical Loop Libraries.
Proceedings of the Seventh International Workshop on Content-Based Multimedia Indexing, 2009

Introduction to the Special Issue on Intrinsic Speech Variations.
Speech Commun., 2007

Automatic speech recognition and speech variability: A review.
Speech Commun., 2007

Automatic Speech Recognition and Intrinsic Speech Variation.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

A study of implicit and explicit modeling of coarticulation and pronunciation variation.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Bimodal combination of speech and handwriting for improved word recognition.
Proceedings of the 13th European Signal Processing Conference, 2005

Robust feature extraction and acoustic modeling at multitel: experiments on the Aurora databases.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Qualcomm-ICSI-OGI features for ASR.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

VTS residual noise compensation.
Proceedings of the IEEE International Conference on Acoustics, 2002

Assessing local noise level estimation methods: Application to noise robust ASR.
Speech Commun., 2001

Robust ASR front-end using spectral-based and discriminant features: experiments on the Aurora tasks.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Audio-Visual Speech Modeling for Continuous Speech Recognition.
IEEE Trans. Multim., 2000

Fast speaker adaptation of artificial neural networks for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2000

Context dependent hybrid HMM/ANN systems for large vocabulary continuous speech recognition system.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Using the multi-stream approach for continuous audio-visual speech recognition: experiments on the M2VTS database.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Missing data reconstruction for robust automatic speech recognition in the framework of hybrid HMM/ANN systems.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Continuous Audio-Visual Speech Recognition.
Proceedings of the Computer Vision, 1998

Context independent and context dependent hybrid HMM/ANN systems for vocabulary independent tasks.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Using multiple time scales in a multi-stream speech recognition system.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Hybrid HMM/ANN systems for training independent tasks: experiments on Phonebook and related improvements.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Subband-based speech recognition.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

A new ASR approach based on independent processing and recombination of partial frequency bands.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Towards subband-based speech recognition.
Proceedings of the 8th European Signal Processing Conference, 1996
