Andrew Rouditchenko

Orcid: 0000-0002-0063-3612

According to our database1, Andrew Rouditchenko authored at least 21 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation.
CoRR, 2024

What, When, and Where? Self-Supervised Spatio- Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition.
CoRR, 2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Contrastive Audio-Visual Masked Autoencoder.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
UAVM: Towards Unifying Audio and Visual Models.
IEEE Signal Process. Lett., 2022

UAVM: A Unified Model for Audio-Visual Learning.
CoRR, 2022

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification.
CoRR, 2022

Everything at Once - Multi-modal Fusion Transformer for Video Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-Modal Discrete Representation Learning.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Routing with Self-Attention for Multimodal Capsule Networks.
CoRR, 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cascaded Multilingual Audio-Visual Learning from Videos.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
CoRR, 2020

2019
Label-efficient audio classification through multitask learning and self-supervision.
CoRR, 2019

Self-supervised Audio-visual Co-segmentation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Self-Supervised Segmentation and Source Separation on Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018
The Sound of Pixels.
Proceedings of the Computer Vision - ECCV 2018, 2018


  Loading...