2025

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?

[DOI]

Bangyan Li

Wenxuan Huang

CoRR, March, 2025

FlowAgent: Achieving Compliance and Flexibility for Workflow Agents.

[DOI]

CoRR, February, 2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.

[DOI]

CoRR, February, 2025

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her.

[DOI]

CoRR, January, 2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.

[DOI]

CoRR, January, 2025

Probability-Density-aware Semi-supervised Learning.

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Feature Denoising Diffusion Model for Blind Image Quality Assessment.

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Training-Free Transformer Architecture Search With Zero-Cost Proxy Guided Evolution.

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2024

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression.

[DOI]

CoRR, 2024

Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment.

[DOI]

CoRR, 2024

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM.

[DOI]

CoRR, 2024

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models.

[DOI]

CoRR, 2024

Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models.

[DOI]

CoRR, 2024

Local Manifold Learning for No-Reference Image Quality Assessment.

[DOI]

CoRR, 2024

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.

[DOI]

CoRR, 2024

Multi-Modal Prompt Learning on Blind Image Quality Assessment.

[DOI]

CoRR, 2024

RESTORE: Towards Feature Shift for Vision-Language Prompt Learning.

[DOI]

CoRR, 2024

Feature Denoising Diffusion Model for Blind Image Quality Assessment.

[DOI]

CoRR, 2024

Woodpecker: hallucination correction for multimodal large language models.

[DOI]

Sci. China Inf. Sci., 2024

A<sup>3</sup>R: Vision Language Pre-training by Attentive Alignment and Attentive Reconstruction.

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

Multimodal Inplace Prompt Tuning for Open-set Object Detection.

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Integrating Global Context Contrast and Local Sensitivity for Blind Image Quality Assessment.

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity.

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Aligning and Prompting Everything All at Once for Universal Visual Perception.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

A General and Efficient Training for Transformer via Token Expansion.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Sinkhorn Distance Minimization for Knowledge Distillation.

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Semi-Supervised Blind Image Quality Assessment through Knowledge Distillation and Incremental Learning.

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Weakly Supervised Open-Vocabulary Object Detection.

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space.

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger.

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

MISSU: 3D Medical Image Segmentation via Self-Distilling TransUNet.

[DOI]

IEEE Trans. Medical Imaging, September, 2023

Reciprocal normalization for domain adaptation.

[DOI]

Pattern Recognit., August, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.

[DOI]

CoRR, 2023

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples.

[DOI]

CoRR, 2023

Adaptive Feature Selection for No-Reference Image Quality Assessment using Contrastive Mitigating Semantic Noise Sensitivity.

[DOI]

CoRR, 2023

Less is More: Learning Reference Knowledge Using No-Reference Image Quality Assessment.

[DOI]

CoRR, 2023

Towards Robust Text Retrieval with Progressive Learning.

[DOI]

CoRR, 2023

A Survey on Multimodal Large Language Models.

[DOI]

CoRR, 2023

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models.

[DOI]

CoRR, 2023

SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger.

[DOI]

CoRR, 2023

Data-Free Low-Bit Quantization via Dynamic Multi-teacher Knowledge Distillation.

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Classifier Decoupled Training for Black-Box Unsupervised Domain Adaptation.

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Multi-modal Queried Object Detection in the Wild.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization.

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation.

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer.

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

LAB-Net: LAB Color-Space Oriented Lightweight Network for Shadow Removal.

[DOI]

CoRR, 2022

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization.

[DOI]

CoRR, 2022

Super Vision Transformer.

[DOI]

CoRR, 2022

Shadow-Aware Dynamic Convolution for Shadow Removal.

[DOI]

CoRR, 2022

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining.

[DOI]

CoRR, 2022

Coarse-to-Fine Vision Transformer.

[DOI]

CoRR, 2022

Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need.

[DOI]

CoRR, 2022

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Best Combination for Efficient N: M Sparsity.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Self-supervised Models are Good Teaching Assistants for Vision Transformers.

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Fine-grained Data Distribution Alignment for Post-Training Quantization.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

DisCo: Remedying Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Decoder-Free Object Detection with Transformers.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

ARM: Any-Time Super-Resolution Method.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Training-free Transformer Architecture Search.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer.

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

RMNet: Equivalently Removing Residual Connection from Networks.

[DOI]

CoRR, 2021

Fine-grained Data Distribution Alignment for Post-Training Quantization.

[DOI]

CoRR, 2021

ISTR: End-to-End Instance Segmentation with Transformers.

[DOI]

CoRR, 2021

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning.

[DOI]

CoRR, 2021

On Evolving Attention Towards Domain Adaptation.

[DOI]

CoRR, 2021

On The Consistency Training for Open-Set Semi-Supervised Learning.

[DOI]

CoRR, 2021

Architecture Disentanglement for Deep Neural Networks.

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Removing the Background by Adding the Background: Towards Background Robust Self-Supervised Video Representation Learning.

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

One for More: Selecting Generalizable Samples for Generalizable ReID Model.

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion.

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Semi-Supervised Adversarial Monocular Depth Estimation.

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2020

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning.

[DOI]

CoRR, 2020

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion.

[DOI]

CoRR, 2020

DGD: Densifying the Knowledge of Neural Networks with Filter Grafting and Knowledge Distillation.

[DOI]

CoRR, 2020

Architecture Disentanglement for Deep Neural Networks.

[DOI]

CoRR, 2020

Pruning Filter in Filter.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Filter Grafting for Deep Neural Networks.

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Asymmetric Co-Teaching for Unsupervised Cross-Domain Person Re-Identification.

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Vocal Melody Extraction via DNN-based Pitch Estimation and Salience-based Pitch Refinement.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2017

Data-Driven Synthesis of Cartoon Faces Using Different Styles.

[DOI]

IEEE Trans. Image Process., 2017

Fusing transcription results from polyphonic and monophonic audio for singing melody transcription in polyphonic music.

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2014

Data-driven face cartoon stylization.

[DOI]

Proceedings of the SIGGRAPH Asia 2014 Technical Briefs, 2014