Xiyang Dai

Orcid: 0000-0003-1761-8715

According to our database1, Xiyang Dai authored at least 59 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs.
CoRR, 2024

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search.
CoRR, 2024

Data-Augmentation Based CBAM-ResNet-GCN Method for Unbalance Fault Diagnosis of Rotating Machinery.
IEEE Access, 2024

Efficient Modulation for Vision Networks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Rewrite the Stars.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
On the Hidden Waves of Image.
CoRR, 2023

Image is First-order Norm+Linear Autoregressive.
CoRR, 2023

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System.
CoRR, 2023

OmniTracker: Unifying Object Tracking by Tracking-with-Detection.
CoRR, 2023

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Generalized Decoding for Pixel, Image, and Language.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Look Before You Match: Instance Understanding Matters in Video Object Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Self-Supervised Learning based on Heat Equation.
CoRR, 2022

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling.
CoRR, 2022

Should All Proposals be Treated Equally in Object Detection?
CoRR, 2022

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks.
CoRR, 2022

Residual Mixture of Experts.
CoRR, 2022

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks.
CoRR, 2022

GLIPv2: Unifying Localization and Vision-Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Focal Modulation Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Self-supervised Vision Transformers for Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Should All Proposals Be Treated Equally in Object Detection?
Proceedings of the Computer Vision - ECCV 2022, 2022

RegionCLIP: Region-based Language-Image Pretraining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BEVT: BERT Pretraining of Video Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Reduce Information Loss in Transformers for Pluralistic Image Inpainting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Mobile-Former: Bridging MobileNet and Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Florence: A New Foundation Model for Computer Vision.
CoRR, 2021

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning.
CoRR, 2021

Focal Self-attention for Local-Global Interactions in Vision Transformers.
CoRR, 2021

Weak NAS Predictors Are All You Need.
CoRR, 2021

Focal Attention for Long-Range Interactions in Vision Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Stronger NAS with Weaker Predictors.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Revisiting Dynamic Convolution via Matrix Decomposition.
Proceedings of the 9th International Conference on Learning Representations, 2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CvT: Introducing Convolutions to Vision Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

MicroNet: Improving Image Recognition with Extremely Low FLOPs.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Dynamic DETR: End-to-End Object Detection with Dynamic Attention.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Dynamic Head: Unifying Object Detection Heads With Attentions.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
MicroNet: Towards Image Recognition with Extremely Low FLOPs.
CoRR, 2020

DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search.
Proceedings of the Computer Vision - ECCV 2020, 2020

Dynamic ReLU.
Proceedings of the Computer Vision - ECCV 2020, 2020

METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Dynamic Convolution: Attention Over Convolution Kernels.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
TAN: Temporal Aggregation Network for Dense Multi-Label Action Recognition.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Modeling Deep Context in Spatial and Temporal Domain.
PhD thesis, 2018

Deep Motion Boundary Detection.
CoRR, 2018

S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks.
Proceedings of the British Machine Vision Conference 2018, 2018

Dynamic Temporal Pyramid Network: A Closer Look at Multi-scale Modeling for Activity Detection.
Proceedings of the Computer Vision - ACCV 2018, 2018

2017
Efficient Fine-Grained Classification and Part Localization Using One Compact Network.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Temporal Context Network for Activity Localization in Videos.
Proceedings of the IEEE International Conference on Computer Vision, 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Parameterizing Region Covariance: An Efficient Way To Apply Sparse Codes On Second Order Statistics.
CoRR, 2016


  Loading...