Zehuan Yuan

Orcid: 0000-0002-0349-9367

According to our database1, Zehuan Yuan authored at least 74 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling.
CoRR, 2024

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation.
CoRR, 2024

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation.
CoRR, 2024

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction.
CoRR, 2024

View Crafting For Instance-Level Representation from Scene Images.
Proceedings of the IEEE International Conference on Acoustics, 2024

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

General Object Foundation Model for Images and Videos at Scale.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Generative Region-Language Pretraining for Open-Ended Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Sparse R-CNN: An End-to-End Framework for Object Detection.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

DMRNet++: Learning Discriminative Features With Decoupled Networks and Enriched Pairs for One-Step Person Search.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Trimap-guided feature mining and fusion network for natural image matting.
Comput. Vis. Image Underst., April, 2023

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces.
CoRR, 2023

Recognize Any Regions.
CoRR, 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst.
CoRR, 2023

Meta Compositional Referring Expression Segmentation.
CoRR, 2023

EGC: Image Generation and Classification via a Diffusion Energy-Based Model.
CoRR, 2023

Multi-Level Contrastive Learning for Dense Prediction Task.
CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Learning Object-Language Alignments for Open-Vocabulary Object Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Transformers for Open-world Instance Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Segment Every Reference Object in Spatial and Temporal Spaces.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EGC: Image Generation and Classification via a Diffusion Energy-Based Model.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Meta Compositional Referring Expression Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Token Boosting for Robust Self-Supervised Visual Transformer Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-Commerce.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Universal Instance Perception as Object Discovery and Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Birds of a Feather Flock Together: Category-Divergence Guidance for Domain Adaptive Segmentation.
IEEE Trans. Image Process., 2022

Conditional Hyper-Network for Blind Super-Resolution With Multiple Degradations.
IEEE Trans. Image Process., 2022

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.
CoRR, 2022

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders.
CoRR, 2022

ManiCLIP: Multi-Attribute Face Manipulation from Text.
CoRR, 2022

Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization.
CoRR, 2022

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition.
CoRR, 2022

QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rethinking Resolution in the Context of Efficient Video Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Objects in Semantic Topology.
Proceedings of the Tenth International Conference on Learning Representations, 2022

ByteTrack: Multi-object Tracking by Associating Every Detection Box.
Proceedings of the Computer Vision - ECCV 2022, 2022

Masked Generative Distillation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Grand Unification of Object Tracking.
Proceedings of the Computer Vision - ECCV 2022, 2022

Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2022, 2022

You Should Look at All Objects.
Proceedings of the Computer Vision - ECCV 2022, 2022

Focal and Global Knowledge Distillation for Detectors.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Language as Queries for Referring Video Object Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Content-Variant Reference Image Quality Assessment via Knowledge Distillation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
ByteTrack: Multi-Object Tracking by Associating Every Detection Box.
CoRR, 2021

Memory Based Video Scene Parsing.
CoRR, 2021

Center Prediction Loss for Re-identification.
CoRR, 2021

Conditional Meta-Network for Blind Super-Resolution with Multiple Degradations.
CoRR, 2021

Disentangled Contrastive Learning on Graphs.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multimodal Video Summarization via Time-Aware Transformers.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

What Makes for End-to-End Object Detection?
Proceedings of the 38th International Conference on Machine Learning, 2021

Exploring Balanced Feature Spaces for Representation Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Domain-Invariant Disentangled Network for Generalizable Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Weakly Supervised Person Search with Region Siamese Networks.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Sparse R-CNN: End-to-End Object Detection With Learnable Proposals.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Slimmable Generative Adversarial Networks.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
TransTrack: Multiple-Object Tracking with Transformer.
CoRR, 2020

OneNet: Towards End-to-End One-Stage Object Detection.
CoRR, 2020

Moflowgan: Video Generation With Flow Guidance.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

Controllable Orthogonalization in Training DNNs.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Non-Local Neural Networks With Grouped Bilinear Attentional Transforms.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Towards Good Practices for Instance Segmentation.
CoRR, 2019

Deformable Tube Network for Action Detection in Videos.
CoRR, 2019

2018
Towards Good Practices for Multi-modal Fusion in Large-Scale Video Classification.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Knowing Where to Look? Analysis on Attention of Visual Question Answering System.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018


  Loading...