2025
LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
FlowAgent: Achieving Compliance and Flexibility for Workflow Agents.
CoRR, February, 2025
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
LUCY: Linguistic Understanding and Control Yielding Early Stage of Her.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
Probability-Density-aware Semi-supervised Learning.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Feature Denoising Diffusion Model for Blind Image Quality Assessment.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
Training-Free Transformer Architecture Search With Zero-Cost Proxy Guided Evolution.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2024
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression.
CoRR, 2024
Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment.
CoRR, 2024
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM.
CoRR, 2024
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models.
CoRR, 2024
Local Manifold Learning for No-Reference Image Quality Assessment.
CoRR, 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Multi-Modal Prompt Learning on Blind Image Quality Assessment.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
RESTORE: Towards Feature Shift for Vision-Language Prompt Learning.
CoRR, 2024
Feature Denoising Diffusion Model for Blind Image Quality Assessment.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Woodpecker: hallucination correction for multimodal large language models.
Sci. China Inf. Sci., 2024
A<sup>3</sup>R: Vision Language Pre-training by Attentive Alignment and Attentive Reconstruction.
Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024
Multimodal Inplace Prompt Tuning for Open-set Object Detection.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Integrating Global Context Contrast and Local Sensitivity for Blind Image Quality Assessment.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Aligning and Prompting Everything All at Once for Universal Visual Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
A General and Efficient Training for Transformer via Token Expansion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Sinkhorn Distance Minimization for Knowledge Distillation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Semi-Supervised Blind Image Quality Assessment through Knowledge Distillation and Incremental Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Weakly Supervised Open-Vocabulary Object Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
MISSU: 3D Medical Image Segmentation via Self-Distilling TransUNet.
IEEE Trans. Medical Imaging, September, 2023
Reciprocal normalization for domain adaptation.
Pattern Recognit., August, 2023
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples.
CoRR, 2023
Adaptive Feature Selection for No-Reference Image Quality Assessment using Contrastive Mitigating Semantic Noise Sensitivity.
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Less is More: Learning Reference Knowledge Using No-Reference Image Quality Assessment.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Towards Robust Text Retrieval with Progressive Learning.
CoRR, 2023
A Survey on Multimodal Large Language Models.
CoRR, 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger.
CoRR, 2023
Data-Free Low-Bit Quantization via Dynamic Multi-teacher Knowledge Distillation.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023
Classifier Decoupled Training for Black-Box Unsupervised Domain Adaptation.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023
CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Multi-modal Queried Object Detection in the Wild.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
CF-ViT: A General Coarse-to-Fine Method for Vision Transformer.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
LAB-Net: LAB Color-Space Oriented Lightweight Network for Shadow Removal.
CoRR, 2022
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization.
CoRR, 2022
Super Vision Transformer.
CoRR, 2022
Shadow-Aware Dynamic Convolution for Shadow Removal.
CoRR, 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining.
CoRR, 2022
Coarse-to-Fine Vision Transformer.
CoRR, 2022
Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need.
CoRR, 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Learning Best Combination for Efficient N: M Sparsity.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Self-supervised Models are Good Teaching Assistants for Vision Transformers.
Proceedings of the International Conference on Machine Learning, 2022
Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks.
Proceedings of the Computer Vision - ECCV 2022, 2022
Fine-grained Data Distribution Alignment for Post-Training Quantization.
Proceedings of the Computer Vision - ECCV 2022, 2022
DisCo: Remedying Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022
Efficient Decoder-Free Object Detection with Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022
ARM: Any-Time Super-Resolution Method.
Proceedings of the Computer Vision - ECCV 2022, 2022
Training-free Transformer Architecture Search.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
RMNet: Equivalently Removing Residual Connection from Networks.
CoRR, 2021
Fine-grained Data Distribution Alignment for Post-Training Quantization.
CoRR, 2021
ISTR: End-to-End Instance Segmentation with Transformers.
CoRR, 2021
DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning.
CoRR, 2021
On Evolving Attention Towards Domain Adaptation.
CoRR, 2021
On The Consistency Training for Open-Set Semi-Supervised Learning.
CoRR, 2021
Architecture Disentanglement for Deep Neural Networks.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Removing the Background by Adding the Background: Towards Background Robust Self-Supervised Video Representation Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
One for More: Selecting Generalizable Samples for Generalizable ReID Model.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Semi-Supervised Adversarial Monocular Depth Estimation.
IEEE Trans. Pattern Anal. Mach. Intell., 2020
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning.
CoRR, 2020
Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion.
CoRR, 2020
DGD: Densifying the Knowledge of Neural Networks with Filter Grafting and Knowledge Distillation.
CoRR, 2020
Architecture Disentanglement for Deep Neural Networks.
CoRR, 2020
Pruning Filter in Filter.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Filter Grafting for Deep Neural Networks.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Asymmetric Co-Teaching for Unsupervised Cross-Domain Person Re-Identification.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Vocal Melody Extraction via DNN-based Pitch Estimation and Salience-based Pitch Refinement.
Proceedings of the IEEE International Conference on Acoustics, 2019
2017
Data-Driven Synthesis of Cartoon Faces Using Different Styles.
IEEE Trans. Image Process., 2017
Fusing transcription results from polyphonic and monophonic audio for singing melody transcription in polyphonic music.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
2014
Data-driven face cartoon stylization.
Proceedings of the SIGGRAPH Asia 2014 Technical Briefs, 2014