2025

The mutual exclusivity bias of bilingual visually grounded speech models.

[DOI]

,

,

Yevgen Matusevych

,

CoRR, June, 2025

Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era.

[DOI]

,

Desmond Elliott

,

CoRR, June, 2025

DeCLIP: Decoding CLIP Representations for Deepfake Localization.

[DOI]

,

Elisabeta Oneata

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Easy, Interpretable, Effective: openSMILE for voice deepfake detection.

[DOI]

,

,

,

Nicolas M. Müller

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Visually Grounded Few-Shot Word Learning in Low-Resource Settings.

[DOI]

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Visually Grounded Speech Models Have a Mutual Exclusivity Bias.

[DOI]

,

,

Yevgen Matusevych

,

Trans. Assoc. Comput. Linguistics, 2024

Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning.

[DOI]

Dragos-Alexandru Boldisor

,

,

,

Elisabeta Oneata

CoRR, 2024

Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings.

[DOI]

,

,

CoRR, 2024

Weakly-supervised deepfake localization in diffusion-generated images.

[DOI]

Dragos-Constantin Tântaru

,

Elisabeta Oneata

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Towards generalisable and calibrated audio deepfake detection with self-supervised representations.

[DOI]

,

,

,

Elisabeta Oneata

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Translating speech with just images.

[DOI]

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

Towards generalisable and calibrated synthetic speech detection with self-supervised representations.

[DOI]

,

,

,

Elisabeta Oneata

,

CoRR, 2023

The SpeeD-ZevoTech submission at DISPLACE 2023.

[DOI]

Gabriel Pirlogeanu

,

,

Alexandru-Lucian Georgescu

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

FlexLip: A Controllable Text-to-Lip System.

[DOI]

,

,

,

Sensors, 2022

Keyword Localisation in Untranscribed Speech Using Visually Grounded Speech Models.

[DOI]

,

,

IEEE J. Sel. Top. Signal Process., 2022

YFACC: A Yorùbá Speech-Image Dataset for Cross-Lingual Keyword Localisation Through Visual Grounding.

[DOI]

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Multilingual Multimodal Learning with Machine Translated Text.

[DOI]

,

,

Emanuele Bugliarello

,

,

Desmond Elliott

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations.

[DOI]

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

2021

Multimodal speech recognition for unmanned aerial vehicles.

[DOI]

,

Comput. Electr. Eng., 2021

An Evaluation of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition.

[DOI]

,

Alexandru Caranica

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Data-Filtering Methods for Self-Training of Automatic Speech Recognition Systems.

[DOI]

Alexandru-Lucian Georgescu

,

Cristian Manolache

,

,

,

Corneliu Burileanu

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Speaker disentanglement in video-to-speech conversion.

[DOI]

,

,

Proceedings of the 29th European Signal Processing Conference, 2021

2020

Revisiting SincNet: An Evaluation of Feature and Network Hyperparameters for Speaker Recognition.

[DOI]

,

Lucian Georgescu

,

,

Dragos Burileanu

,

Corneliu Burileanu

Proceedings of the 28th European Signal Processing Conference, 2020

2019

The Quo Vadis submission at Traffic4cast 2019.

[DOI]

,

Cosmin George Alexandru

,

Marius Stanescu

,

,

Alexandru Magan

,

Adrian Postelnicu

,

CoRR, 2019

Kite: Automatic Speech Recognition for Unmanned Aerial Vehicles.

[DOI]

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2016

A Robust and Efficient Video Representation for Action Recognition.

[DOI]

,

,

,

Cordelia Schmid

Int. J. Comput. Vis., 2016

2015

Robust and efficient models for action recognition and localization. (Modèles robustes et efficaces pour la reconnaissance d'action et leur localisation).

[DOI]

PhD thesis, 2015

2014

The INRIA-LIM-VocR and AXES submissions to TrecVid 2014 Multimedia Event Detection.

[DOI]

,

,

,

,

Nicolas Chesneau

,

,

,

Karteek Alahari

,

Zaïd Harchaoui

,

,

Jean-Luc Gauvain

,

Christoph Schmidt

,

Cordelia Schmid

Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Spatio-temporal Object Detection Proposals.

[DOI]

,

Jérôme Revaud

,

,

Cordelia Schmid

Proceedings of the Computer Vision - ECCV 2014, 2014

Efficient Action Localization with Approximately Normalized Fisher Vectors.

[DOI]

,

,

Cordelia Schmid

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013

The AXES submissions at TRECVID 2013.

[DOI]

,

Relja Arandjelovic

,

,

,

Basura Fernando

,

Zaïd Harchaoui

,

Kevin McGuinness

,

Noel E. O'Connor

,

,

Omkar M. Parkhi

,

,

Jérôme Revaud

,

Cordelia Schmid

,

Jochen Schwenninger

,

,

Tinne Tuytelaars

,

,

,

Andrew Zisserman

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Action and Event Recognition with Fisher Vectors on a Compact Feature Set.

[DOI]

,

,

Cordelia Schmid

Proceedings of the IEEE International Conference on Computer Vision, 2013

2012

AXES at TRECVID 2012: KIS, INS, and MED.

[DOI]

,

Kevin McGuinness

,

,

Noel E. O'Connor

,

,

Omkar M. Parkhi

,

Relja Arandjelovic

,

Andrew Zisserman

,

Basura Fernando

,

Tinne Tuytelaars

,

,

,

Jérôme Revaud

,

Jochen Schwenninger

,

,

,

Zaïd Harchaoui

,

,

Cordelia Schmid

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012