Haoxuan You

Orcid: 0000-0002-7912-4035

According to our database¹, Haoxuan You authored at least 34 papers between 2017 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2017

2018

2019

2020

2021

2022

2023

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Ego: Towards Building Egocentric Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning.

[BibT_eX]

[DOI]

CoRR, 2024

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images.

[BibT_eX]

[DOI]

CoRR, 2024

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices.

[BibT_eX]

[DOI]

CoRR, 2024

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Ferret: Refer and Ground Anything Anywhere at Any Granularity.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

CoBIT: A Contrastive Bi-directional Image-Text Generation Model.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks.

[BibT_eX]

[DOI]

CoRR, 2022

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks.

[BibT_eX]

[DOI]

CoRR, 2022

SHREC'22 track: Open-Set 3D Object Retrieval.

[BibT_eX]

[DOI]

Comput. Graph., 2022

Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Graph-MLP: Node Classification without Message Passing in Graph.

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

2020

PointHop: An Explainable Machine Learning Method for Point Cloud Classification.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions.

[BibT_eX]

[DOI]

CoRR, 2020

Learning Visual Commonsense for Robust Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Decoding EEG by Visual-guided Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Multi-Modality Latent Interaction Network for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

PVRNet: Point-View Relation Neural Network for 3D Shape Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Hypergraph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

MeshNet: Mesh Neural Network for 3D Shape Representation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017

Restricting Greed in Training of Generative Adversarial Network.

[BibT_eX]

[DOI]

CoRR, 2017

Haoxuan You

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...