2025
The mutual exclusivity bias of bilingual visually grounded speech models.
CoRR, June, 2025
Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era.
CoRR, June, 2025
DeCLIP: Decoding CLIP Representations for Deepfake Localization.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
Easy, Interpretable, Effective: openSMILE for voice deepfake detection.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
2024
Visually Grounded Few-Shot Word Learning in Low-Resource Settings.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Visually Grounded Speech Models Have a Mutual Exclusivity Bias.
Trans. Assoc. Comput. Linguistics, 2024
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning.
CoRR, 2024
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings.
CoRR, 2024
Weakly-supervised deepfake localization in diffusion-generated images.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024
Towards generalisable and calibrated audio deepfake detection with self-supervised representations.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Translating speech with just images.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
2023
Towards generalisable and calibrated synthetic speech detection with self-supervised representations.
CoRR, 2023
The SpeeD-ZevoTech submission at DISPLACE 2023.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
2022
FlexLip: A Controllable Text-to-Lip System.
Sensors, 2022
Keyword Localisation in Untranscribed Speech Using Visually Grounded Speech Models.
IEEE J. Sel. Top. Signal Process., 2022
YFACC: A Yorùbá Speech-Image Dataset for Cross-Lingual Keyword Localisation Through Visual Grounding.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Multilingual Multimodal Learning with Machine Translated Text.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022
2021
Multimodal speech recognition for unmanned aerial vehicles.
Comput. Electr. Eng., 2021
An Evaluation of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Data-Filtering Methods for Self-Training of Automatic Speech Recognition Systems.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Speaker disentanglement in video-to-speech conversion.
Proceedings of the 29th European Signal Processing Conference, 2021
2020
Revisiting SincNet: An Evaluation of Feature and Network Hyperparameters for Speaker Recognition.
Proceedings of the 28th European Signal Processing Conference, 2020
2019
The Quo Vadis submission at Traffic4cast 2019.
CoRR, 2019
Kite: Automatic Speech Recognition for Unmanned Aerial Vehicles.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
2016
A Robust and Efficient Video Representation for Action Recognition.
Int. J. Comput. Vis., 2016
2015
Robust and efficient models for action recognition and localization. (Modèles robustes et efficaces pour la reconnaissance d'action et leur localisation).
PhD thesis, 2015
2014
The INRIA-LIM-VocR and AXES submissions to TrecVid 2014 Multimedia Event Detection.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014
Spatio-temporal Object Detection Proposals.
Proceedings of the Computer Vision - ECCV 2014, 2014
Efficient Action Localization with Approximately Normalized Fisher Vectors.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
2013
The AXES submissions at TRECVID 2013.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013
Action and Event Recognition with Fisher Vectors on a Compact Feature Set.
Proceedings of the IEEE International Conference on Computer Vision, 2013
2012
AXES at TRECVID 2012: KIS, INS, and MED.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012