Wenqi Shao

Orcid: 0000-0003-3781-4086

According to our database1, Wenqi Shao authored at least 72 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
EMOS: <i>E</i>mbodiment-aware Heterogeneous <i>M</i>ulti-robot <i>O</i>perating <i>S</i>ystem with LLM Agents.
CoRR, 2024

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts.
CoRR, 2024

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping.
CoRR, 2024

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression.
CoRR, 2024

DCP: Learning Accelerator Dataflow for Neural Network via Propagation.
CoRR, 2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.
CoRR, 2024

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs.
CoRR, 2024

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction.
CoRR, 2024

Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing.
CoRR, 2024

HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model.
CoRR, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
CoRR, 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model.
CoRR, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models.
CoRR, 2024

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models.
CoRR, 2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.
CoRR, 2024

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices.
CoRR, 2024

Needle In A Multimodal Haystack.
CoRR, 2024

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge.
CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024

Adapting LLaMA Decoder to Vision Transformer.
CoRR, 2024

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.
CoRR, 2024

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.
CoRR, 2024

Towards Implicit Prompt For Text-To-Image Models.
CoRR, 2024

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.
CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024

Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching.
CoRR, 2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
CoRR, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Position: Towards Implicit Prompt For Text-To-Image Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.
Proceedings of the IEEE International Conference on Acoustics, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Cached Transformers: Improving Transformers with Differentiable Memory Cachde.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Cached Transformers: Improving Transformers with Differentiable Memory Cache.
CoRR, 2023

MLLMs-Augmented Visual-Language Representation Learning.
CoRR, 2023

DiffusionMat: Alpha Matting as Sequential Refinement Learning.
CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face.
CoRR, 2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard.
CoRR, 2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest.
CoRR, 2023

Align, Adapt and Inject: Sound-guided Unified Image Generation.
CoRR, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models.
CoRR, 2023

Foundation Model is Efficient Multimodal Multitask Model Selector.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Beyond One-to-One: Rethinking the Referring Image Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Real-Time Controllable Denoising for Image and Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
CO^3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.
CoRR, 2022

Dynamic Token Normalization improves Vision Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Dynamic Token Normalization Improves Vision Transformer.
CoRR, 2021

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening.
CoRR, 2021

Rethinking the Pruning Criteria for Convolutional Neural Network.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution.
Proceedings of the 38th International Conference on Machine Learning, 2021

What Makes for End-to-End Object Detection?
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
SSN: Learning Sparse Switchable Normalization via SparsestMax.
Int. J. Comput. Vis., 2020

Channel Equilibrium Networks for Learning Deep Representation.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
Learning Efficient Detector with Semi-supervised Adaptive Distillation.
CoRR, 2019

Differentiable Dynamic Normalization for Learning Deep Representation.
Proceedings of the 36th International Conference on Machine Learning, 2019

Towards Understanding Regularization in Batch Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

SSN: Learning Sparse Switchable Normalization via SparsestMax.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Learning Efficient Detector with Semi-supervised Adaptive Distillation.
Proceedings of the 30th British Machine Vision Conference 2019, 2019


  Loading...