2025

Mixed Attention and Channel Shift Transformer for Efficient Action Recognition.

[DOI]

,

,

,

,

,

ACM Trans. Multim. Comput. Commun. Appl., March, 2025

SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models.

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2025

Accelerating Diffusion Transformer via Gradient-Optimized Cache.

[DOI]

,

,

,

,

,

CoRR, March, 2025

Prior Preserved Text-to-Image Personalization Without Image Regularization.

[DOI]

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., February, 2025

Accelerating Diffusion Transformer via Error-Optimized Cache.

[DOI]

,

,

,

,

,

CoRR, January, 2025

CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion.

[DOI]

,

,

,

,

,

CoRR, January, 2025

Cross-Modal Hashing via Diverse Instances Matching.

[DOI]

,

,

,

,

,

IEEE Trans. Image Process., 2025

Mixture of Multimodal Adapters for Sentiment Analysis.

[DOI]

,

,

,

,

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective.

[DOI]

,

,

,

,

,

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

Hand1000: Generating Realistic Hands from Text with Only 1, 000 Images.

[DOI]

,

,

,

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

RAGG: Retrieval-Augmented Grasp Generation Model.

[DOI]

,

,

,

,

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition.

[DOI]

,

,

,

,

Int. J. Comput. Vis., December, 2024

When I Fall in Love: Capturing Video-Oriented Social Relationship Evolution via Attentive GNN.

[DOI]

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., June, 2024

FTCM: Frequency-Temporal Collaborative Module for Efficient 3D Human Pose Estimation in Video.

[DOI]

,

,

,

IEEE Trans. Circuits Syst. Video Technol., February, 2024

Two-Step Discrete Hashing for Cross-Modal Retrieval.

[DOI]

,

,

,

,

IEEE Trans. Multim., 2024

Efficient Unsupervised Video Hashing With Contextual Modeling and Structural Controlling.

[DOI]

,

,

,

,

,

IEEE Trans. Multim., 2024

Feature Mixture on Pre-Trained Model for Few-Shot Learning.

[DOI]

,

,

,

,

IEEE Trans. Image Process., 2024

Iterative Semantic Transformer by Greedy Distillation for Community Question Answering.

[DOI]

,

,

Jeyarajan Thiyagalingam

,

,

,

,

John Yannis Goulermas

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters.

[DOI]

,

,

,

,

,

,

CoRR, 2024

Hand1000: Generating Realistic Hands from Text with Only 1,000 Images.

[DOI]

,

,

,

CoRR, 2024

Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation.

[DOI]

,

,

,

,

,

CoRR, 2024

Model Inversion Attacks Through Target-Specific Conditional Diffusion Models.

[DOI]

,

,

,

,

,

,

CoRR, 2024

A Sanity Check for AI-generated Image Detection.

[DOI]

,

,

,

,

,

,

CoRR, 2024

Hierarchical Space-Time Attention for Micro-Expression Recognition.

[DOI]

,

,

,

,

,

CoRR, 2024

A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming.

[DOI]

,

,

,

,

,

,

Jussi Kangasharju

CoRR, 2024

Noise-NeRF: Hide Information in Neural Radiance Fields using Trainable Noise.

[DOI]

,

,

,

CoRR, 2024

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Space-View Decoupled 3D Gaussians for Novel-View Synthesis of Mirror Reflections.

[DOI]

,

,

,

,

Proceedings of the PRICAI 2024: Trends in Artificial Intelligence, 2024

JPA: A Joint-Part Attention for Mitigating Overfocusing on 3D Human Pose Estimation.

[DOI]

,

,

,

,

,

Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Hierarchical Supervised Contrastive Learning for Multimodal Sentiment Analysis.

[DOI]

,

,

Proceedings of the MultiMedia Modeling - 30th International Conference, 2024

Selective Vision-Language Subspace Projection for Few-shot CLIP.

[DOI]

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

PointTFA: Training-Free Clustering Adaption for Large 3D Point Cloud Models.

[DOI]

,

,

,

Basura Fernando

,

,

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Noise-NeRF: Hide Information in Neural Radiance Field Using Trainable Noise.

[DOI]

,

,

,

,

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024

Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective.

[DOI]

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing.

[DOI]

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GLCM-Adapter: Global-Local Content Matching for Few-shot CLIP Adaptation.

[DOI]

,

,

,

,

Proceedings of the 35th British Machine Vision Conference, 2024

Boosting Few-Shot Learning via Attentive Feature Regularization.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Boosting Hyperspectral Image Classification with Dual Hierarchical Learning.

[DOI]

,

,

,

,

ACM Trans. Multim. Comput. Commun. Appl., January, 2023

MLP-JCG: Multi-Layer Perceptron With Joint-Coordinate Gating for Efficient 3D Human Pose Estimation.

[DOI]

,

,

,

IEEE Trans. Multim., 2023

Question-aware dynamic scene graph of local semantic representation learning for visual question answering.

[DOI]

,

,

,

,

,

Pattern Recognit. Lett., 2023

CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval.

[DOI]

,

,

,

,

CoRR, 2023

3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing.

[DOI]

,

,

,

,

,

CoRR, 2023

Selective Volume Mixup for Video Action Recognition.

[DOI]

,

,

,

,

,

CoRR, 2023

TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction.

[DOI]

,

,

,

,

,

CoRR, 2023

CgT-GAN: CLIP-guided Text GAN for Image Captioning.

[DOI]

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Semantic-based Selection, Synthesis, and Supervision for Few-shot Learning.

[DOI]

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

3D Human Pose Estimation with Spatio-Temporal Criss-Cross Attention.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

How Can Contrastive Pre-training Benefit Audio-Visual Segmentation? A Study from Supervised and Zero-shot Perspectives.

[DOI]

,

,

,

,

,

,

Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022

Social Context-aware Person Search in Videos via Multi-modal Cues.

[DOI]

,

,

,

,

,

,

ACM Trans. Inf. Syst., 2022

Spatio-Temporal Collaborative Module for Efficient Action Recognition.

[DOI]

,

,

,

,

,

IEEE Trans. Image Process., 2022

Attention in Attention: Modeling Context Correlation for Efficient Video Classification.

[DOI]

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., 2022

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis.

[DOI]

,

,

,

,

,

,

Proceedings of the MultiMedia Modeling - 28th International Conference, 2022

Long-term Leap Attention, Short-term Periodic Shift for Video Classification.

[DOI]

,

,

,

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Parameterization of Cross-token Relations with Relative Positional Encoding for Vision MLP.

[DOI]

,

,

,

,

,

,

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Hierarchical Hourglass Convolutional Network for Efficient Video Classification.

[DOI]

,

,

,

,

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Unified QA-aware Knowledge Graph Generation Based on Multi-modal Modeling.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation.

[DOI]

,

,

,

,

,

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Multi-directional Knowledge Transfer for Few-Shot Learning.

[DOI]

,

,

,

,

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Group Contextualization for Video Recognition.

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Learning to Match Anchor-Target Video Pairs With Dual Attentional Holographic Networks.

[DOI]

,

,

IEEE Trans. Image Process., 2021

Quantitative Analysis of the Research Trends and Areas in Grassland Remote Sensing: A Scientometrics Analysis of Web of Science from 1980 to 2020.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Remote. Sens., 2021

Auxiliary Diagnosis for COVID-19 with Deep Transfer Learning.

[DOI]

,

,

,

,

,

,

,

J. Digit. Imaging, 2021

Token Shift Transformer for Video Classification.

[DOI]

,

,

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Selective Dependency Aggregation for Action Classification.

[DOI]

,

,

,

,

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

NASTER: Non-local Attentional Scene Text Recognizer.

[DOI]

,

,

,

,

Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Motion Prediction using Trajectory Cues.

[DOI]

,

,

,

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Aggregated Multi-GANs for Controlled 3D Human Motion Prediction.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Neighbourhood Structure Preserving Cross-Modal Embedding for Video Hyperlinking.

[DOI]

,

,

IEEE Trans. Multim., 2020

Cross-Domain Sentiment Encoding through Stochastic Word Embedding.

[DOI]

,

,

,

,

,

John Yannis Goulermas

IEEE Trans. Knowl. Data Eng., 2020

Advance on large scale near-duplicate video retrieval.

[DOI]

,

,

Frontiers Comput. Sci., 2020

Compact Bilinear Augmented Query Structured Attention for Sport Highlights Classification.

[DOI]

,

,

,

,

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Person-level Action Recognition in Complex Events via TSD-TSM Networks.

[DOI]

,

,

,

,

,

,

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cross-sentence Pre-trained Model for Interactive QA matching.

[DOI]

,

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

2019

Quantitative Assessment of the Impact of Physical and Anthropogenic Factors on Vegetation Spatial-Temporal Variation in Northern Tibet.

[DOI]

,

,

,

,

,

,

,

,

,

,

Remote. Sens., 2019

3D human pose estimation via human structure-aware fully connected network.

[DOI]

,

,

,

Pattern Recognit. Lett., 2019

R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network.

[DOI]

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2017

Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval.

[DOI]

,

,

,

,

,

John Yannis Goulermas

IEEE Trans. Multim., 2017

Unsupervised t-Distributed Video Hashing and Its Deep Hashing Extension.

[DOI]

,

,

John Yannis Goulermas

,

,

,

IEEE Trans. Image Process., 2017

2016

Variability and Changes in Climate, Phenology, and Gross Primary Production of an Alpine Wetland Ecosystem.

[DOI]

,

,

,

,

,

,

,

,

,

Remote. Sens., 2016

基于信息系统属性同态的数据压缩 (Data Compression with Attribute Homomorphism in Information Systems).

[DOI]

,

,

计算机科学, 2016

2014

On improving behavior subtraction.

[DOI]

,

,

,

,

Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics, 2014

2012

Verification of a threshold concept of ecologically effective precipitation pulse: From plant individuals to ecosystem.

[DOI]

,

,

,

,

,

Ecol. Informatics, 2012

2010

The sensitivity of temperate steppe CO<sub>2</sub> exchange to the quantity and timing of natural interannual rainfall.

[DOI]

,

,

,

,

,

Xiangzhong Huang

Ecol. Informatics, 2010

2006

TV Program Recommendation for Multiple Viewers Based on user Profile Merging.

[DOI]

,

,

,

User Model. User Adapt. Interact., 2006