2025
Mixed Attention and Channel Shift Transformer for Efficient Action Recognition.
ACM Trans. Multim. Comput. Commun. Appl., March, 2025
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models.
CoRR, March, 2025
Accelerating Diffusion Transformer via Gradient-Optimized Cache.
CoRR, March, 2025
Prior Preserved Text-to-Image Personalization Without Image Regularization.
IEEE Trans. Circuits Syst. Video Technol., February, 2025
Accelerating Diffusion Transformer via Error-Optimized Cache.
CoRR, January, 2025
CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion.
CoRR, January, 2025
Cross-Modal Hashing via Diverse Instances Matching.
IEEE Trans. Image Process., 2025
Mixture of Multimodal Adapters for Sentiment Analysis.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025
Hand1000: Generating Realistic Hands from Text with Only 1, 000 Images.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
RAGG: Retrieval-Augmented Grasp Generation Model.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition.
Int. J. Comput. Vis., December, 2024
When I Fall in Love: Capturing Video-Oriented Social Relationship Evolution via Attentive GNN.
IEEE Trans. Circuits Syst. Video Technol., June, 2024
FTCM: Frequency-Temporal Collaborative Module for Efficient 3D Human Pose Estimation in Video.
IEEE Trans. Circuits Syst. Video Technol., February, 2024
Two-Step Discrete Hashing for Cross-Modal Retrieval.
IEEE Trans. Multim., 2024
Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling.
IEEE Trans. Multim., 2024
Feature Mixture on Pre-Trained Model for Few-Shot Learning.
IEEE Trans. Image Process., 2024
Iterative Semantic Transformer by Greedy Distillation for Community Question Answering.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters.
CoRR, 2024
Hand1000: Generating Realistic Hands from Text with Only 1,000 Images.
CoRR, 2024
Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation.
CoRR, 2024
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models.
CoRR, 2024
A Sanity Check for AI-generated Image Detection.
CoRR, 2024
Hierarchical Space-Time Attention for Micro-Expression Recognition.
CoRR, 2024
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming.
CoRR, 2024
Noise-NeRF: Hide Information in Neural Radiance Fields using Trainable Noise.
CoRR, 2024
Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024
Space-View Decoupled 3D Gaussians for Novel-View Synthesis of Mirror Reflections.
Proceedings of the PRICAI 2024: Trends in Artificial Intelligence, 2024
JPA: A Joint-Part Attention for Mitigating Overfocusing on 3D Human Pose Estimation.
Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Hierarchical Supervised Contrastive Learning for Multimodal Sentiment Analysis.
Proceedings of the MultiMedia Modeling - 30th International Conference, 2024
Selective Vision-Language Subspace Projection for Few-shot CLIP.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
PointTFA: Training-Free Clustering Adaption for Large 3D Point Cloud Models.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Noise-NeRF: Hide Information in Neural Radiance Field Using Trainable Noise.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024
Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective.
Proceedings of the Computer Vision - ECCV 2024, 2024
3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing.
Proceedings of the Computer Vision - ECCV 2024, 2024
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
GLCM-Adapter: Global-Local Content Matching for Few-shot CLIP Adaptation.
Proceedings of the 35th British Machine Vision Conference, 2024
Boosting Few-Shot Learning via Attentive Feature Regularization.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Boosting Hyperspectral Image Classification with Dual Hierarchical Learning.
ACM Trans. Multim. Comput. Commun. Appl., January, 2023
MLP-JCG: Multi-Layer Perceptron With Joint-Coordinate Gating for Efficient 3D Human Pose Estimation.
IEEE Trans. Multim., 2023
Question-aware dynamic scene graph of local semantic representation learning for visual question answering.
Pattern Recognit. Lett., 2023
CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval.
CoRR, 2023
3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing.
CoRR, 2023
Selective Volume Mixup for Video Action Recognition.
CoRR, 2023
TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction.
CoRR, 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Semantic-based Selection, Synthesis, and Supervision for Few-shot Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
3D Human Pose Estimation with Spatio-Temporal Criss-Cross Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
How Can Contrastive Pre-training Benefit Audio-Visual Segmentation? A Study from Supervised and Zero-shot Perspectives.
Proceedings of the 34th British Machine Vision Conference 2023, 2023
2022
Social Context-aware Person Search in Videos via Multi-modal Cues.
ACM Trans. Inf. Syst., 2022
Spatio-Temporal Collaborative Module for Efficient Action Recognition.
IEEE Trans. Image Process., 2022
Attention in Attention: Modeling Context Correlation for Efficient Video Classification.
IEEE Trans. Circuits Syst. Video Technol., 2022
MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis.
Proceedings of the MultiMedia Modeling - 28th International Conference, 2022
Long-term Leap Attention, Short-term Periodic Shift for Video Classification.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Parameterization of Cross-token Relations with Relative Positional Encoding for Vision MLP.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Hierarchical Hourglass Convolutional Network for Efficient Video Classification.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Unified QA-aware Knowledge Graph Generation Based on Multi-modal Modeling.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Multi-directional Knowledge Transfer for Few-Shot Learning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Group Contextualization for Video Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
Learning to Match Anchor-Target Video Pairs With Dual Attentional Holographic Networks.
IEEE Trans. Image Process., 2021
Quantitative Analysis of the Research Trends and Areas in Grassland Remote Sensing: A Scientometrics Analysis of Web of Science from 1980 to 2020.
,
,
,
,
,
,
,
,
,
,
,
,
,
Remote. Sens., 2021
Auxiliary Diagnosis for COVID-19 with Deep Transfer Learning.
J. Digit. Imaging, 2021
Token Shift Transformer for Video Classification.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Selective Dependency Aggregation for Action Classification.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
NASTER: Non-local Attentional Scene Text Recognizer.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021
Motion Prediction using Trajectory Cues.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Aggregated Multi-GANs for Controlled 3D Human Motion Prediction.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Neighbourhood Structure Preserving Cross-Modal Embedding for Video Hyperlinking.
IEEE Trans. Multim., 2020
Cross-Domain Sentiment Encoding through Stochastic Word Embedding.
IEEE Trans. Knowl. Data Eng., 2020
Advance on large scale near-duplicate video retrieval.
Frontiers Comput. Sci., 2020
Compact Bilinear Augmented Query Structured Attention for Sport Highlights Classification.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Person-level Action Recognition in Complex Events via TSD-TSM Networks.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Cross-sentence Pre-trained Model for Interactive QA matching.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
2019
Quantitative Assessment of the Impact of Physical and Anthropogenic Factors on Vegetation Spatial-Temporal Variation in Northern Tibet.
,
,
,
,
,
,
,
,
,
,
Remote. Sens., 2019
3D human pose estimation via human structure-aware fully connected network.
Pattern Recognit. Lett., 2019
R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2017
Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval.
IEEE Trans. Multim., 2017
Unsupervised t-Distributed Video Hashing and Its Deep Hashing Extension.
IEEE Trans. Image Process., 2017
2016
Variability and Changes in Climate, Phenology, and Gross Primary Production of an Alpine Wetland Ecosystem.
Remote. Sens., 2016
基于信息系统属性同态的数据压缩 (Data Compression with Attribute Homomorphism in Information Systems).
计算机科学, 2016
2014
On improving behavior subtraction.
Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics, 2014
2012
Verification of a threshold concept of ecologically effective precipitation pulse: From plant individuals to ecosystem.
Ecol. Informatics, 2012
2010
The sensitivity of temperate steppe CO<sub>2</sub> exchange to the quantity and timing of natural interannual rainfall.
Ecol. Informatics, 2010
2006
TV Program Recommendation for Multiple Viewers Based on user Profile Merging.
User Model. User Adapt. Interact., 2006