Yuying Ge

Orcid: 0000-0001-5818-2589

According to our database¹, Yuying Ge authored at least 41 papers between 2019 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios.

[BibT_eX]

[DOI]

CoRR, 2024

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation.

[BibT_eX]

[DOI]

CoRR, 2024

SEED-Story: Multimodal Long Story Generation with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension.

[BibT_eX]

[DOI]

CoRR, 2024

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Supervised Fine-tuning in turn Improves Visual Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2024

Making LLaMA SEE and Draw with SEED Tokenizer.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

SEED-Bench: Benchmarking Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VIT-LENS: Towards Omni-modal Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

SEED-Bench-2: Benchmarking Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

ViT-Lens-2: Gateway to Omni-modal Intelligence.

[BibT_eX]

[DOI]

CoRR, 2023

SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension.

[BibT_eX]

[DOI]

CoRR, 2023

Planting a SEED of Vision in Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

All in One: Exploring Unified Video-Language Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Policy Adaptation from Foundation Model Feedback.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

2022

MetaCloth: Learning Unseen Tasks of Dense Fashion Landmark Detection From a Few Samples.

[BibT_eX]

[DOI]

Yuying Ge

Ruimao Zhang

Ping Luo

IEEE Trans. Image Process., 2022

Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval.

[BibT_eX]

[DOI]

CoRR, 2022

All in One: Exploring Unified Video-Language Pre-training.

[BibT_eX]

[DOI]

CoRR, 2022

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning.

[BibT_eX]

[DOI]

CoRR, 2022

BridgeFormer: Bridging Video-text Retrieval with Multiple Choice Questions.

[BibT_eX]

[DOI]

CoRR, 2022

Unsupervised Medical Image Registration Based on Multi-scale Cascade Network.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Bridging Video-text Retrieval with Multiple Choice Questions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Parser-Free Virtual Try-On via Distilling Appearance Flows.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2019

SCAN: Self-and-Collaborative Attention Network for Video Person Re-Identification.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images.

[BibT_eX]

[DOI]

CoRR, 2019

DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Yuying Ge

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...