Yixiao Ge

Orcid: 0000-0002-5351-5329

According to our database1, Yixiao Ge authored at least 99 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Structured Domain Adaptation With Online Relation Regularization for Unsupervised Person Re-ID.
IEEE Trans. Neural Networks Learn. Syst., January, 2024

Vision-Language Instruction Tuning: A Review and Analysis.
Trans. Mach. Learn. Res., 2024

A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups.
IEEE Control. Syst. Lett., 2024

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation.
CoRR, 2024

Geometric Data Fusion for Collaborative Attitude Estimation.
CoRR, 2024

SEED-Story: Multimodal Long Story Generation with Large Language Model.
CoRR, 2024

VoCo-LLaMA: Towards Vision Compression with Large Language Models.
CoRR, 2024

GrootVL: Tree Topology is All You Need in State Space Model.
CoRR, 2024

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots.
CoRR, 2024

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing.
CoRR, 2024

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension.
CoRR, 2024

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.
CoRR, 2024

Supervised Fine-tuning in turn Improves Visual Foundation Models.
CoRR, 2024

Towards A Better Metric for Text-to-Video Generation.
CoRR, 2024

An Equivariant Approach to Robust State Estimation for the ArduPilot Autopilot System.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Making LLaMA SEE and Draw with SEED Tokenizer.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ST-LLM: Large Language Models Are Effective Temporal Learners.
Proceedings of the Computer Vision - ECCV 2024, 2024

DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment.
Proceedings of the Computer Vision - ECCV 2024, 2024

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SEED-Bench: Benchmarking Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VIT-LENS: Towards Omni-modal Representations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SmartEdit: Exploring Complex Instruction-Based Image Editing with Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

YOLO-World: Real-Time Open-Vocabulary Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LLaMA Pro: Progressive LLaMA with Block Expansion.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Cached Transformers: Improving Transformers with Differentiable Memory Cachde.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Cached Transformers: Improving Transformers with Differentiable Memory Cache.
CoRR, 2023

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation.
CoRR, 2023

EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models.
CoRR, 2023

SEED-Bench-2: Benchmarking Multimodal Large Language Models.
CoRR, 2023

ViT-Lens-2: Gateway to Omni-modal Intelligence.
CoRR, 2023

One For All: Video Conversation is Feasible Without Video Instruction Tuning.
CoRR, 2023

Equivariant Symmetries for Inertial Navigation Systems.
CoRR, 2023

ViT-Lens: Towards Omni-modal Representations.
CoRR, 2023

SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension.
CoRR, 2023

Planting a SEED of Vision in Large Language Model.
CoRR, 2023

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals.
CoRR, 2023

PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas.
CoRR, 2023

TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter.
CoRR, 2023

Sticker820K: Empowering Interactive Retrieval with Stickers.
CoRR, 2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale.
CoRR, 2023

What Makes for Good Visual Tokenizers for Large Language Models?
CoRR, 2023

Attack is Good Augmentation: Towards Skeleton-Contrastive Representation Learning.
CoRR, 2023

TagGPT: Large Language Models are Zero-shot Multimodal Taggers.
CoRR, 2023

Masked Visual Reconstruction in Language Semantic Space.
CoRR, 2023

Modeling Uncertain Feature Representation for Domain Generalization.
CoRR, 2023

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Binary Embedding-based Retrieval at Tencent.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation.
Proceedings of the International Conference on Machine Learning, 2023

Masked Image Modeling with Denoising Contrast.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

BoxSnake: Polygonal Instance Segmentation with Box Supervision.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Model Transferability through the Lens of Potential Energy.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RILS: Masked Visual Reconstruction in Language Semantic Space.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Accelerating Vision-Language Pretraining with Free Language Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

All in One: Exploring Unified Video-Language Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

A Note on the Extended Kalman Filter on a Manifold.
Proceedings of the 62nd IEEE Conference on Decision and Control, 2023

Darwinian Model Upgrades: Model Evolving with Selective Compatibility.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Video-Text Pre-training with Learned Regions for Retrieval.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.
CoRR, 2022

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis.
CoRR, 2022

Privacy-Preserving Model Upgrades with Bidirectional Compatible Training in Image Retrieval.
CoRR, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval.
CoRR, 2022

Revitalize Region Feature for Democratizing Video-Language Pre-training.
CoRR, 2022

All in One: Exploring Unified Video-Language Pre-training.
CoRR, 2022

Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval.
CoRR, 2022

BridgeFormer: Bridging Video-text Retrieval with Multiple Choice Questions.
CoRR, 2022

Towards Universal Backward-Compatible Representation Learning.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Hot-Refresh Model Upgrades with Regression-Free Compatible Training in Image Retrieval.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Dynamic Token Normalization improves Vision Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Uncertainty Modeling for Out-of-Distribution Generalization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space.
Proceedings of the Computer Vision - ECCV 2022, 2022

mc-BEiT: Multi-choice Discretization for Image BERT Pre-training.
Proceedings of the Computer Vision - ECCV 2022, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval.
Proceedings of the Computer Vision - ECCV 2022, 2022

Object-aware Video-language Pre-training for Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Bridging Video-text Retrieval with Multiple Choice Questions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Equivariant Filter Design for Discrete-time Systems.
Proceedings of the 61st IEEE Conference on Decision and Control, 2022

2021
Dynamic Token Normalization Improves Vision Transformer.
CoRR, 2021

Video-Text Pre-training with Learned Regions.
CoRR, 2021

Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification.
CoRR, 2021

Consensus-Guided Correspondence Denoising.
CoRR, 2021

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Progressive Correspondence Pruning by Consensus Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Mutual CRF-GNN for Few-Shot Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Improved Mutual Mean-Teaching for Unsupervised Domain Adaptive Re-ID.
CoRR, 2020

Structured Domain Adaptation for Unsupervised Person Re-identification.
CoRR, 2020

Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.
Proceedings of the 8th International Conference on Learning Representations, 2020

Self-supervising Fine-Grained Region Similarities for Large-Scale Image Localization.
Proceedings of the Computer Vision - ECCV 2020, 2020

2018
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018


  Loading...