Marcus Rohrbach

Orcid: 0000-0001-5908-7751

Affiliations:
  • TU Darmstadt, Germany


According to our database1, Marcus Rohrbach authored at least 88 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Simple Token-Level Confidence Improves Caption Correctness.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Efficient Pre-training for Localized Instruction Generation of Procedural Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Improving Selective Visual Question Answering by Learning from Your Peers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.
Proceedings of the Computer Vision - ECCV 2022, 2022

CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

FLAVA: A Foundational Language And Vision Alignment Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning To Recognize Procedural Activities with Distant Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting.
Proceedings of the 9th International Conference on Learning Representations, 2021

A New Split for Evaluating True Zero-Shot Action Recognition.
Proceedings of the Pattern Recognition - 43rd DAGM German Conference, DAGM GCPR 2021, Bonn, Germany, September 28, 2021

KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

SMART Frame Selection for Action Recognition.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Decoupling Representation and Classifier for Long-Tailed Recognition.
Proceedings of the 8th International Conference on Learning Representations, 2020

Uncertainty-guided Continual Learning with Bayesian Neural Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

TextCaps: A Dataset for Image Captioning with Reading Comprehension.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning to Generate Grounded Visual Captions Without Localization Supervision.
Proceedings of the Computer Vision - ECCV 2020, 2020

Adversarial Continual Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

12-in-1: Multi-Task Vision and Language Representation Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

In Defense of Grid Features for Visual Question Answering.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Learning to Generate Grounded Image Captions without Localization Supervision.
CoRR, 2019

Continual Learning with Tiny Episodic Memories.
CoRR, 2019

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering.
Proceedings of the 36th International Conference on Machine Learning, 2019

Efficient Lifelong Learning with A-GEM.
Proceedings of the 7th International Conference on Learning Representations, 2019

Selfless Sequential Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Grounded Video Description.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Towards VQA Models That Can Read.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Cycle-Consistency for Robust Visual Question Answering.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Adversarial Inference for Multi-Sentence Video Description.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Uncertainty-Guided Continual Learning in Bayesian Neural Networks - Extended Abstract.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Graph-Based Global Reasoning Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Large-Scale Visual Relationship Understanding.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Pythia v0.1: the Winning Entry to the VQA Challenge 2018.
CoRR, 2018

A Dataset for Telling the Stories of Social Media Videos.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Visual Coreference Resolution in Visual Dialog Using Neural Module Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Memory Aware Synapses: Learning What (not) to Forget.
Proceedings of the Computer Vision - ECCV 2018, 2018

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Exploring the Challenges Towards Lifelong Fact Learning.
Proceedings of the Computer Vision - ACCV 2018, 2018

2017
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Movie Description.
Int. J. Comput. Vis., 2017

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering.
Int. J. Comput. Vis., 2017

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract).
CoRR, 2017

Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Learning to Reason: End-to-End Module Networks for Visual Question Answering.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Captioning Images with Diverse Objects.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Generating Descriptions with Grounded and Co-referenced People.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Modeling Relationships in Referential Expressions with Compositional Modular Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Recognizing Fine-Grained and Composite Activities Using Hand-Centric Features and Script Data.
Int. J. Comput. Vis., 2016

Attributes as Semantic Units between Natural Language and Visual Recognition.
CoRR, 2016

Attentive Explanations: Justifying Decisions and Pointing to the Evidence.
CoRR, 2016

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions.
CoRR, 2016

Learning to Compose Neural Networks for Question Answering.
Proceedings of the NAACL HLT 2016, 2016

Multimodal Video Description.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Grounding of Textual Phrases in Images by Reconstruction.
Proceedings of the Computer Vision - ECCV 2016, 2016

Segmentation from Natural Language Expressions.
Proceedings of the Computer Vision - ECCV 2016, 2016

Generating Visual Explanations.
Proceedings of the Computer Vision - ECCV 2016, 2016

Natural Language Object Retrieval.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Neural Module Networks.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
A Multi-scale Multiple Instance Video Description Network.
CoRR, 2015

Deep Compositional Question Answering with Neural Module Networks.
CoRR, 2015

Translating Videos to Natural Language Using Deep Recurrent Neural Networks.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Sequence to Sequence - Video to Text.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Spatial Semantic Regularisation for Large Scale Object Detection.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

The Long-Short Story of Movie Description.
Proceedings of the Pattern Recognition - 37th German Conference, 2015

A dataset for Movie Description.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Combining visual recognition and computational linguistics : linguistic knowledge for visual recognition and natural language descriptions of visual content.
PhD thesis, 2014

Coherent Multi-Sentence Video Description with Variable Level of Detail.
CoRR, 2014

Coherent Multi-sentence Video Description with Variable Level of Detail.
Proceedings of the Pattern Recognition - 36th German Conference, 2014

2013
Grounding Action Descriptions in Videos.
Trans. Assoc. Comput. Linguistics, 2013

Transfer Learning in a Transductive Setting.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Translating Video Content to Natural Language Descriptions.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Multi-view Pictorial Structures for 3D Human Pose Estimation.
Proceedings of the British Machine Vision Conference, 2013

2012
3D Object Detection with Multiple Kinects.
Proceedings of the Computer Vision - ECCV 2012. Workshops and Demonstrations, 2012

Script Data for Attribute-Based Recognition of Composite Activities.
Proceedings of the Computer Vision - ECCV 2012, 2012

A database for fine grained activity detection of cooking activities.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011
The Benefits of Dense Stereo for Pedestrian Detection.
IEEE Trans. Intell. Transp. Syst., 2011

Evaluating knowledge transfer and zero-shot learning in a large-scale setting.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

2010
Combining Language Sources and Robust Semantic Relatedness for Attribute-Based Knowledge Transfer.
Proceedings of the Trends and Topics in Computer Vision, 2010

What helps where - and why? Semantic relatedness for knowledge transfer.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
High-Level Fusion of Depth and Intensity for Pedestrian Classification.
Proceedings of the Pattern Recognition, 2009


  Loading...