Jingjing Chen

Orcid: 0000-0003-3148-264X

Affiliations:
  • Fudan University, School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China
  • National University of Singapore, School of Computing, Singapore
  • City University of Hong Kong, Department of Computer Science, Hong Kong (PhD 2018)


According to our database1, Jingjing Chen authored at least 119 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Adaptive Cross-Modal Transferable Adversarial Attacks From Images to Videos.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

BiC-Net: Learning Efficient Spatio-temporal Relation for Text-Video Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., March, 2024

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition.
ACM Trans. Multim. Comput. Commun. Appl., February, 2024

Locate Before Answering: Answer Guided Question Localization for Video Question Answering.
IEEE Trans. Multim., 2024

EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models.
CoRR, 2024

EventHallusion: Diagnosing Event Hallucinations in Video LLMs.
CoRR, 2024

RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models.
CoRR, 2024

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios.
CoRR, 2024

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models.
CoRR, 2024

Identity-Driven Multimedia Forgery Detection via Reference Assistance.
CoRR, 2024

Identity-Driven Multimedia Forgery Detection via Reference Assistance.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Navigating Weight Prediction with Diet Diary.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Cross-Point Adversarial Attack Based on Feature Neighborhood Disruption Against Segment Anything Model.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Doubly Abductive Counterfactual Inference for Text-Based Image Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Open-Vocabulary Video Relation Extraction.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Knowledge driven weights estimation for large-scale few-shot image recognition.
Pattern Recognit., October, 2023

Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation.
IEEE Trans. Multim., 2023

Scene Graph Refinement Network for Visual Question Answering.
IEEE Trans. Multim., 2023

Self-Supervised Learning for Semi-Supervised Temporal Language Grounding.
IEEE Trans. Multim., 2023

Dynamic Mixup for Multi-Label Long-Tailed Food Ingredient Recognition.
IEEE Trans. Multim., 2023

Towards Transferable Adversarial Attacks on Image and Video Transformers.
IEEE Trans. Image Process., 2023

Multimodal Pre-training Method for Vision-language Understanding and Generation.
Int. J. Softw. Informatics, 2023

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model.
CoRR, 2023

Cross-domain Food Image-to-Recipe Retrieval by Weighted Adversarial Learning.
CoRR, 2023

On the Importance of Spatial Relations for Few-shot Action Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Generalizing Face Forgery Detection via Uncertainty Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Relation Triplet Construction for Cross-modal Text-to-Video Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Adaptive Split-Fusion Transformer.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

GC-GAN: Photo Cartoonization Using Guided Cartoon Generative Adversarial Network.
Proceedings of the Artificial Neural Networks and Machine Learning, 2023

SVFormer: Semi-supervised Video Transformer for Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Enhancing the Self-Universality for Transferable Targeted Attacks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Adversarial Multi-Grained Embedding Network for Cross-Modal Text-Video Retrieval.
ACM Trans. Multim. Comput. Commun. Appl., 2022

Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval.
IEEE Trans. Multim., 2022

Mixed Dish Recognition With Contextual Relation and Domain Alignment.
IEEE Trans. Multim., 2022

Generalized Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data.
IEEE Trans. Image Process., 2022

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection.
CoRR, 2022

Transferability Estimation Based On Principal Gradient Expectation.
CoRR, 2022

Text-driven Video Prediction.
CoRR, 2022

Incorporating Locality of Images to Generate Targeted Transferable Adversarial Examples.
CoRR, 2022

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection.
CoRR, 2022

A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training.
CoRR, 2022

Adaptive Split-Fusion Transformer.
CoRR, 2022

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning.
CoRR, 2022

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding.
CoRR, 2022

Video Moment Retrieval from Text Queries via Single Frame Annotation.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Mix-DANN and Dynamic-Modal-Distillation for Video Domain Adaptation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

CEA++'22: 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

From Abstract to Details: A Generative Multimodal Fusion Framework for Recommendation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Few-shot Food Recognition with Pre-trained Model.
Proceedings of the CEA++@MM 2022: Proceedings of the 1st International Workshop on Multimedia for Cooking, 2022

MCFR'22: 1st Workshop on Multimedia Computing towards Fashion Recommendation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Cross-lingual Adaptation for Recipe Retrieval with Mixup.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

Ingredient-enriched Recipe Generation from Cooking Videos.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

Adaptive Temporal Grouping for Black-box Adversarial Attacks on Videos.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

DiGAN: Directional Generative Adversarial Network for Object Transfiguration.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

Data-Free Network Debiasing for Long-Tailed Visual Recognition.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes.
Proceedings of the Computer Vision - ECCV 2022, 2022

Balanced Contrastive Learning for Long-Tailed Visual Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-Modal Transferable Adversarial Attacks from Images to Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ObjectFormer for Image Manipulation Detection and Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Boosting the Transferability of Video Adversarial Examples via Temporal Translation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Towards Transferable Adversarial Attacks on Vision Transformers.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Attacking Video Recognition Models with Bullet-Screen Comments.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
A Hybrid Approach for Detecting Prerequisite Relations in Multi-Modal Food Recipes.
IEEE Trans. Multim., 2021

A Study of Multi-Task and Region-Wise Deep Learning for Food Ingredient Recognition.
IEEE Trans. Image Process., 2021

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation.
CoRR, 2021

Visual Spatio-temporal Relation-enhanced Network for Cross-modal Text-Video Retrieval.
CoRR, 2021

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection.
CoRR, 2021

Reproducibility Companion Paper: Visual Relation of Interest Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Visual Co-Occurrence Alignment Learning for Weakly-Supervised Video Moment Retrieval.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Reproducibility Companion Paper: Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Fine-grained Cross-modal Alignment Network for Text-Video Retrieval.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

VideoLT: Large-scale Long-tailed Video Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos.
CoRR, 2020

WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cross-domain Cross-modal Food Transfer.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Reproducibility Companion Paper: Instance of Interest Detection.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Video Relation Detection via Multiple Hypothesis Association.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-modal Cooking Workflow Construction for Food Recipes.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Person-level Action Recognition in Complex Events via TSD-TSM Networks.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Visual Relations Augmented Cross-modal Retrieval.
Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

Clean-Label Backdoor Attacks on Video Recognition Models.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Hyperbolic Visual Embedding Learning for Zero-Shot Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Heuristic Black-Box Adversarial Attacks on Video Recognition Models.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Zero-Shot Ingredient Recognition by Multi-Relational Graph Convolutional Network.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Mixed-dish Recognition with Contextual Relation Networks.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

DietLens-Eout: Large Scale Restaurant Food Photo Recognition.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Mixed Dish Recognition through Multi-Label Learning.
Proceedings of the 11th Workshop on Multimedia for Cooking and Eating Activities, 2019

2018
Cross-modal recipe retrieval with stacked attention model.
Multim. Tools Appl., 2018

Food Photo Recognition for Dietary Tracking: System and Experiment.
Proceedings of the MultiMedia Modeling - 24th International Conference, 2018

Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017
Cross-Modal Recipe Retrieval: How to Cook this Dish?
Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017

Cross-modal Recipe Retrieval with Rich Food Attributes.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

PIC2DISH: A Customized Cooking Assistant System.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

2016
Deep-based Ingredient Recognition for Cooking Recipe Retrieval.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

2015
Image aesthetics enhancement using composition-based saliency detection.
Multim. Syst., 2015

2014
Feature selection with spatial path coding for multimedia analysis.
Inf. Sci., 2014

VIREO @ TRECVID 2014: Instance Search and Semantic Indexing.
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Human Skin Detection via Semantic Constraint.
Proceedings of the International Conference on Internet Multimedia Computing and Service, 2014

2013
Object coding on the semantic graph for scene classification.
Proceedings of the ACM Multimedia Conference, 2013

Visual saliency detection based on photographic composition.
Proceedings of the International Conference on Internet Multimedia Computing and Service, 2013

2012
Object clique representation for scene classification.
Proceedings of the 21st International Conference on Pattern Recognition, 2012


  Loading...