Qi Wu

Orcid: 0000-0003-3631-256X

Affiliations:
  • University of Adelaide, School of Computer Science, Australian Centre for Robotic Vision, Adelaide, Australia
  • University of Bath, UK (PhD 2015)


According to our database1, Qi Wu authored at least 197 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
UniMiSS+: Universal Medical Self-Supervised Learning From Cross-Dimensional Unpaired Data.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

An Adaptive Correlation Filtering Method for Text-Based Person Search.
Int. J. Comput. Vis., October, 2024

Room-Object Entity Prompting and Reasoning for Embodied Referring Expression.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Image Captioning With Controllable and Adaptive Length Levels.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Transformer-Based Relational Inference Network for Complex Visual Relational Reasoning.
ACM Trans. Multim. Comput. Commun. Appl., January, 2024

Test-Time Model Adaptation for Visual Question Answering With Debiased Self-Supervisions.
IEEE Trans. Multim., 2024

Rethinking masked image modelling for medical image representation.
Medical Image Anal., 2024

Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval.
CoRR, 2024

Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs.
CoRR, 2024

XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training.
CoRR, 2024

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models.
CoRR, 2024

Streaming Video Diffusion: Online Video Editing with Diffusion Models.
CoRR, 2024

VL-Mamba: Exploring State Space Models for Multimodal Learning.
CoRR, 2024

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation.
CoRR, 2024

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Dataset, Challenge, and Evaluation for Tumor Segmentation Variability.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Spot the Difference: Difference Visual Question Answering with Residual Alignment.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

LLM as Copilot for Coarse-Grained Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Continual Self-Supervised Learning: Towards Universal Multi-Modal Medical Data Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ModaVerse: Efficiently Transforming Modalities with LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-Training Framework.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Augmented Commonsense Knowledge for Remote Object Grounding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

WebVLN: Vision-and-Language Navigation on Websites.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Data Hiding With Deep Learning: A Survey Unifying Digital Watermarking and Steganography.
IEEE Trans. Comput. Soc. Syst., December, 2023

Medical visual question answering: A survey.
Artif. Intell. Medicine, September, 2023

HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Multi-Granularity Aggregation Transformer for Joint Video-Audio-Text Representation Learning.
IEEE Trans. Circuits Syst. Video Technol., June, 2023

A Proposal-Free One-Stage Framework for Referring Expression Comprehension and Generation via Dense Cross-Attention.
IEEE Trans. Multim., 2023

Rethinking and Improving Feature Pyramids for One-Stage Referring Expression Comprehension.
IEEE Trans. Image Process., 2023

Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering.
IEEE Trans. Image Process., 2023

Subject-Oriented Video Captioning.
CoRR, 2023

Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service.
CoRR, 2023

Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition.
CoRR, 2023

Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search.
CoRR, 2023

SwitchGPT: Adapting Large Language Models for Non-Text Outputs.
CoRR, 2023

S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning.
CoRR, 2023

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment.
CoRR, 2023

AerialVLN: Vision-and-Language Navigation for UAVs.
CoRR, 2023

Attention Mechanisms in Medical Image Segmentation: A Survey.
CoRR, 2023

S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts.
CoRR, 2023

LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Multi-modal Adapter for Medical Vision-and-Language Learning.
Proceedings of the Machine Learning in Medical Imaging - 14th International Workshop, 2023

BHSD: A 3D Multi-class Brain Hemorrhage Segmentation Dataset.
Proceedings of the Machine Learning in Medical Imaging - 14th International Workshop, 2023

PLMVQA: Applying Pseudo Labels for Medical Visual Question Answering with Limited Data.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 Workshops, 2023

MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Unpaired Cross-Modal Interaction Learning for COVID-19 Segmentation on Limited CT Images.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Scaling Data Generation in Vision-and-Language Navigation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ShapeScaffolder: Structure-Aware 3D Shape Generation from Text.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

March in Chat: Interactive Prompting for Remote Embodied Referring Expression.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

AerialVLN: Vision-and-Language Navigation for UAVs.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Identity-Consistent Aggregation for Video Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Memory-efficient Temporal Moment Localization in Long Videos.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

S<sup>3</sup>C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning to Dub Movies via Hierarchical Prosody Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Visual Question Answering - From Theory to Application
Advances in Computer Vision and Pattern Recognition, Springer, ISBN: 978-981-19-0963-4, 2022

Robust Learning From Noisy Web Images Via Data Purification for Fine-Grained Recognition.
IEEE Trans. Multim., 2022

Co-LDL: A Co-Training-Based Label Distribution Learning Method for Tackling Label Noise.
IEEE Trans. Multim., 2022

Show, Price and Negotiate: A Negotiator With Online Value Look-Ahead.
IEEE Trans. Multim., 2022

Structured Multimodal Attentions for TextVQA.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Visual Grounding Via Accumulated Attention.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering.
CoRR, 2022

ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers.
CoRR, 2022

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information.
CoRR, 2022

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation.
CoRR, 2022

ForeSI: Success-Aware Visual Navigation Agent.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Learning Distinct and Representative Modes for Image Captioning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Diagnosing Vision-and-Language Navigation: What Really Matters.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

UniMiSS: Universal Medical Self-supervised Learning via Breaking Dimensionality Barrier.
Proceedings of the Computer Vision - ECCV 2022, 2022

A Simple and Robust Correlation Filtering Method for Text-Based Person Search.
Proceedings of the Computer Vision - ECCV 2022, 2022

HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Maintaining Reasoning Consistency in Compositional Visual Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

V2C: Visual Voice Cloning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Enhancing Person Synthesis in Complex Scenes via Intrinsic and Contextual Structure Modeling.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Program Generation from Diverse Video Demonstrations.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Learning the Dynamics of Visual Relational Reasoning via Reinforced Path Routing.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Referring Expression Comprehension: A Survey of Methods and Datasets.
IEEE Trans. Multim., 2021

Learning Dual Encoding Model for Adaptive Visual Understanding in Visual Dialogue.
IEEE Trans. Image Process., 2021

Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning.
IEEE Trans. Circuits Syst. Video Technol., 2021

Image editing with varying intensities of processing.
Comput. Vis. Image Underst., 2021

LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach.
CoRR, 2021

Unified 2D and 3D Pre-training for Medical Image classification and Segmentation.
CoRR, 2021

Memory Regulation and Alignment toward Generalizer RGB-Infrared Person.
CoRR, 2021

Data Hiding with Deep Learning: A Survey Unifying Digital Watermarking and Steganography.
CoRR, 2021

Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.
CoRR, 2021

Learning for Visual Navigation by Imagining the Success.
CoRR, 2021

Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline.
CoRR, 2021

Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Debiased Visual Question Answering from Feature and Sample Perspectives.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Neighbor-view Enhanced Model for Vision and Language Navigation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Jo-SRC: A Contrastive Approach for Combating Noisy Labels.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Towards Accurate Text-Based Image Captioning With Content Diversity Exploration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

VLN BERT: A Recurrent Vision-and-Language BERT for Navigation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Sketch, Ground, and Refine: Top-Down Dense Video Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

How to Train Your Agent to Read and Write.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval.
IEEE Trans. Multim., 2020

Scripted Video Generation With a Bottom-Up Generative Adversarial Network.
IEEE Trans. Image Process., 2020

Image and Sentence Matching via Semantic Concepts and Order Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Semantics for Robotic Mapping, Perception and Interaction: A Survey.
Found. Trends Robotics, 2020

A Recurrent Vision-and-Language BERT for Navigation.
CoRR, 2020

CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation.
CoRR, 2020

Data-driven Meta-set Based Fine-Grained Visual Classification.
CoRR, 2020

Utilising Prior Knowledge for Visual Navigation: Distil and Adapt.
CoRR, 2020

Language and Visual Entity Relationship Graph for Agent Navigation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Data-driven Meta-set Based Fine-Grained Visual Recognition.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cascade Reasoning Network for Text-based Visual Question Answering.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Visual-Semantic Graph Matching for Visual Grounding.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Medical Data Inquiry Using a Question Answering Model.
Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Sub-Instruction Aware Vision-and-Language Navigation.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Soft Expert Reward Learning for Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering.
Proceedings of the Computer Vision - ECCV 2020, 2020

Object-and-Action Aware Model for Visual Language Navigation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Length-Controllable Image Captioning.
Proceedings of the Computer Vision - ECCV 2020, 2020

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

AIML at VQA-Med 2020: Knowledge Inference via a Skeleton-based Sentence Mapping Approach for Medical Domain Visual Question Answering.
Proceedings of the Working Notes of CLEF 2020, 2020

Modular Graph Attention Network for Complex Visual Relational Reasoning.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Overcoming Language Priors in VQA via Decomposed Linguistic Representations.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Attend and Imagine: Multi-Label Image Classification With Visual Attention and Recurrent Neural Networks.
IEEE Trans. Multim., 2019

Heritage image annotation via collective knowledge.
Pattern Recognit., 2019

Medical image classification using synergic deep learning.
Medical Image Anal., 2019

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019.
CoRR, 2019

Show, Price and Negotiate: A Hierarchical Attention Recurrent Visual Negotiator.
CoRR, 2019

RERERE: Remote Embodied Referring Expressions in Real indoor Environments.
CoRR, 2019

An Attribute-Based High-Level Image Representation for Scene Classification.
IEEE Access, 2019

Watch, Reason and Code: Learning to Represent Videos Using Program.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Multilabel Image Classification With Regional Latent Semantic Dependencies.
IEEE Trans. Multim., 2018

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

FVQA: Fact-Based Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

An Active Information Seeking Model for Goal-oriented Vision-and-Language Tasks.
CoRR, 2018

Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks.
CoRR, 2018

Skin Lesion Classification in Dermoscopy Images Using Synergic Deep Learning.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2018, 2018

Goal-Oriented Visual Question Generation via Intermediate Rewards.
Proceedings of the Computer Vision - ECCV 2018, 2018

Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Learning Semantic Concepts and Order for Image and Sentence Matching.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Visual Question Answering With Memory-Augmented Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Connecting Language and Vision to Actions.
Proceedings of ACL 2018, Melbourne, Australia, July 15-20, 2018, Tutorial Abstracts, 2018

HCVRD: A Benchmark for Large-Scale Human-Centered Visual Relationship Detection.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Kill Two Birds With One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Visual Question Answering: A Tutorial.
IEEE Signal Process. Mag., 2017

Visual question answering: A survey of methods and datasets.
Comput. Vis. Image Underst., 2017

Learning Semantic Concepts and Order for Image and Sentence Matching.
CoRR, 2017

Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards.
CoRR, 2017

Care about you: towards large-scale human-centric visual relationship detection.
CoRR, 2017

Classification of Medical Images and Illustrations in the Biomedical Literature Using Synergic Deep Learning.
CoRR, 2017

Explicit Knowledge-based Reasoning for Visual Question Answering.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Historical Image Annotation by Exploring the Tag Relevance.
Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 2017

2016
Multi-Label Image Classification with Regional Latent Semantic Dependencies.
CoRR, 2016

Image Captioning and Visual Question Answering Based on Attributes and Their Related External Knowledge.
CoRR, 2016

Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

What Value Do Explicit High Level Concepts Have in Vision to Language Problems?
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Modelling visual objects regardless of depictive style.
PhD thesis, 2015

Cross-depiction problem: Recognition and synthesis of photographs and artwork.
Comput. Vis. Media, 2015

Image Captioning with an Intermediate Attributes Layer.
CoRR, 2015

The Cross-Depiction Problem: Computer Vision Algorithms for Recognising Objects in Artwork and in Photographs.
CoRR, 2015

Beyond Photo-Domain Object Recognition: Benchmarks for the Cross-Depiction Problem.
Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop, 2015

2014
Learning Graphs to Model Visual Objects across Different Depictive Styles.
Proceedings of the Computer Vision - ECCV 2014, 2014

2013
Modelling Visual Objects Invariant to Depictive Style.
Proceedings of the British Machine Vision Conference, 2013

2012
Prime Shapes in Natural Images.
Proceedings of the British Machine Vision Conference, 2012


  Loading...