Haoxuan You

Orcid: 0000-0002-7912-4035

According to our database1, Haoxuan You authored at least 34 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2017
2018
2019
2020
2021
2022
2023
2024
0
1
2
3
4
5
6
7
8
9
10
6
3
1
2
1
3
3
5
1
1
7
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models.
CoRR, 2024

MM-Ego: Towards Building Egocentric Multimodal LLMs.
CoRR, 2024

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning.
CoRR, 2024

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images.
CoRR, 2024

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models.
CoRR, 2024

LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices.
CoRR, 2024

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Ferret: Refer and Ground Anything Anywhere at Any Granularity.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

CoBIT: A Contrastive Bi-directional Image-Text Generation Model.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks.
CoRR, 2022

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks.
CoRR, 2022

SHREC'22 track: Open-Set 3D Object Retrieval.
Comput. Graph., 2022

Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training.
Proceedings of the Computer Vision - ECCV 2022, 2022

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Graph-MLP: Node Classification without Message Passing in Graph.
CoRR, 2021

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

2020
PointHop: An Explainable Machine Learning Method for Point Cloud Classification.
IEEE Trans. Multim., 2020

Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions.
CoRR, 2020

Learning Visual Commonsense for Robust Scene Graph Generation.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Decoding EEG by Visual-guided Deep Neural Networks.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Multi-Modality Latent Interaction Network for Visual Question Answering.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

PVRNet: Point-View Relation Neural Network for 3D Shape Recognition.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Hypergraph Neural Networks.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

MeshNet: Mesh Neural Network for 3D Shape Representation.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017
Restricting Greed in Training of Generative Adversarial Network.
CoRR, 2017


  Loading...