Andrew Rouditchenko

Orcid: 0000-0002-0063-3612

According to our database¹, Andrew Rouditchenko authored at least 21 papers between 2018 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation.

[BibT_eX]

[DOI]

CoRR, 2024

What, When, and Where? Self-Supervised Spatio- Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Andrew Rouditchenko

Ronan Collobert

Tatiana Likhomanenko

CoRR, 2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Contrastive Audio-Visual Masked Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

UAVM: Towards Unifying Audio and Visual Models.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

UAVM: A Unified Model for Audio-Visual Learning.

[BibT_eX]

[DOI]

CoRR, 2022

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification.

[BibT_eX]

[DOI]

CoRR, 2022

Everything at Once - Multi-modal Fusion Transformer for Video Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-Modal Discrete Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Routing with Self-Attention for Multimodal Capsule Networks.

[BibT_eX]

[DOI]

CoRR, 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.

[BibT_eX]

[DOI]

Rogério Schmidt Feris

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cascaded Multilingual Audio-Visual Learning from Videos.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Label-efficient audio classification through multitask learning and self-supervision.

[BibT_eX]

[DOI]

CoRR, 2019

Self-supervised Audio-visual Co-segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Self-Supervised Segmentation and Source Separation on Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018

The Sound of Pixels.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Andrew Rouditchenko

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...