Xin Wang

Orcid: 0000-0003-2605-5504

Affiliations:
  • University of California, Santa Cruz, CA, USA
  • University of California, Santa Barbara, CA, USA (Ph.D.)


According to our database1, Xin Wang authored at least 94 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Agent S: An Open Agentic Framework that Uses Computers Like a Human.
CoRR, 2024

Multimodal Situational Safety.
CoRR, 2024

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding.
CoRR, 2024

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing.
CoRR, 2024

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation.
CoRR, 2024

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos.
CoRR, 2024

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA.
CoRR, 2024

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation.
CoRR, 2024

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing.
CoRR, 2024

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA.
CoRR, 2024

Multimodal Procedural Planning via Dual Text-Image Prompting.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Evaluating Multi-Agent Coordination Abilities in Large Language Models.
CoRR, 2023

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens.
CoRR, 2023

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation.
CoRR, 2023

R2H: Building Multimodal Navigation Helpers that Respond to Help.
CoRR, 2023

Discriminative Diffusion Models as Few-shot Vision and Language Learners.
CoRR, 2023

CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PHOTOSWAP: Personalized Subject Swapping in Images.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation.
Proceedings of the International Conference on Machine Learning, 2023

Neuro-Symbolic Procedural Planning with Commonsense Prompting.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Multimodal Graph Transformer for Multimodal Question Answering.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Parameter-Efficient Model Adaptation for Vision Transformers.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
ComCLIP: Training-Free Compositional Image and Text Matching.
CoRR, 2022

Anticipating the Unseen Discrepancy for Vision and Language Navigation.
CoRR, 2022

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents.
CoRR, 2022

Neuro-Symbolic Causal Language Planning with Commonsense Prompting.
CoRR, 2022

Aerial Vision-and-Dialog Navigation.
CoRR, 2022

Parameter-efficient Fine-tuning for Vision Transformers.
CoRR, 2022

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Diagnosing Vision-and-Language Navigation: What Really Matters.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Imagination-Augmented Natural Language Understanding.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Understanding Instance-Level Impact of Fairness Constraints.
Proceedings of the International Conference on Machine Learning, 2022

CPL: Counterfactual Prompt Learning for Vision and Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Language-Driven Artistic Style Transfer.
Proceedings of the Computer Vision - ECCV 2022, 2022

M<sup>3</sup>L: Language-based Video Editing via Multi-Modal Multi-Level Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Assessing Multilingual Fairness in Pre-trained Multimodal Representations.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence Masking.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Vision-Language Navigation Policy Learning and Adaptation.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

CUDA-GR: Controllable Unsupervised Domain Adaptation for Gaze Redirection.
CoRR, 2021

Language-Driven Image Style Transfer.
CoRR, 2021

Language-based Video Editing via Multi-Modal Multi-Level Transformer.
CoRR, 2021

Visual Question Rewriting for Increasing Response Rate.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020
Closing the Loop Between Language and Vision for Embodied Agents.
PhD thesis, 2020

Relational Graph Learning for Grounded Video Description Generation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Environment-Agnostic Multitask Learning for Natural Language Grounded Navigation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler.
Proceedings of the Computer Vision - ECCV 2020, 2020

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling.
CoRR, 2019

Cross-Lingual Vision-Language Navigation.
CoRR, 2019

Not All Actions Are Equal: Learning to Stop in Language-Grounded Urban Navigation.
Proceedings of the Visually Grounded Interaction and Language (ViGIL), 2019

Natural Language Grounded Multitask Navigation.
Proceedings of the Visually Grounded Interaction and Language (ViGIL), 2019

Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

TIGEr: Text-to-Image Grounding for Image Caption Evaluation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Self-Supervised Dialogue Learning.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Self-Supervised Learning for Contextualized Extractive Summarization.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Virtual dictionary based kernel sparse representation for face recognition.
Pattern Recognit., 2018

Enhancing the Robustness of Prior Network in Out-of-Distribution Detection.
CoRR, 2018

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

XL-NBT: A Cross-lingual Neural Belief Tracking Framework.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2018, 2018

Video Captioning via Hierarchical Reinforcement Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks.
Proceedings of the British Machine Vision Conference 2018, 2018

No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Deep Reinforcement Learning for Visual Object Tracking in Videos.
CoRR, 2017

Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2013
A Novel Information Search and Recommendation Services Platform Based on an Indexing Network (Short Paper).
Proceedings of the 2013 IEEE 6th International Conference on Service-Oriented Computing and Applications, 2013


  Loading...