Long Chen

Orcid: 0000-0001-6148-9709

Affiliations:
  • Hong Kong University of Science and Technology, Hong Kong
  • Columbia University, Digital Video and Multimedia (DVMM) Lab, New York, NY, USA (former)
  • Zhejiang University, College of Computer Science and Technology, Hangzhou, China (former)


According to our database1, Long Chen authored at least 106 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2024

CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation.
IEEE Trans. Circuits Syst. Video Technol., January, 2024

In Defense of Clip-Based Video Relation Detection.
IEEE Trans. Image Process., 2024

GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning.
IEEE Trans. Image Process., 2024

LLMs Can Evolve Continually on Modality for X-Modal Reasoning.
CoRR, 2024

A Comprehensive Survey of Datasets, Theories, Variants, and Applications in Direct Preference Optimization.
CoRR, 2024

From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation.
CoRR, 2024

Di<sup>2</sup>Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation.
CoRR, 2024

FreeTuner: Any Subject in Any Style with Training-free Diffusion.
CoRR, 2024

Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation.
CoRR, 2024

Boundary and Relation Distillation for Semantic Segmentation.
CoRR, 2024

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval.
Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, 2024

PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mrtnet: Multi-Resolution Temporal Network for Video Sentence Grounding.
Proceedings of the IEEE International Conference on Acoustics, 2024

Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

View-Consistent 3D Editing with Gaussian Splatting.
Proceedings of the Computer Vision - ECCV 2024, 2024

DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism.
Proceedings of the Computer Vision - ECCV 2024, 2024

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning.
Proceedings of the Computer Vision - ECCV 2024, 2024

An Efficient and Effective Transformer Decoder-Based Framework for Multi-task Visual Grounding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Distributionally Generative Augmentation for Fair Facial Attribute Classification.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach.
ACM Trans. Multim. Comput. Commun. Appl., November, 2023

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Federated unsupervised representation learning.
Frontiers Inf. Technol. Electron. Eng., August, 2023

VL-NMS: Breaking Proposal Bottlenecks in Two-stage Visual-language Matching.
ACM Trans. Multim. Comput. Commun. Appl., 2023

DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism.
CoRR, 2023

Compositional Zero-shot Learning via Progressive Language-based Observations.
CoRR, 2023

Video Referring Expression Comprehension via Transformer with Content-conditioned Query.
CoRR, 2023

MEDOE: A Multi-Expert Decoder and Output Ensemble Framework for Long-tailed Semantic Segmentation.
CoRR, 2023

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards.
CoRR, 2023

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs.
CoRR, 2023

TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding.
CoRR, 2023

Decomposed Prototype Learning for Few-Shot Scene Graph Generation.
CoRR, 2023

Learning Combinatorial Prompts for Universal Controllable Image Captioning.
CoRR, 2023

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Two Heads are Better Than One: A Simple Exploration Framework for Efficient Multi-Agent Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Video Referring Expression Comprehension via Transformer with Content-conditioned Query.
Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval, 2023

Discrepancy-Guided Reconstruction Learning for Image Forgery Detection.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TempCLR: Temporal Alignment Representation with Contrastive Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Video Scene Graph Generation from Single-Frame Weak Supervision.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Compositional Feature Augmentation for Unbiased Scene Graph Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Iterative Proposal Refinement for Weakly-Supervised Video Grounding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Deep Motion Prior for Weakly-Supervised Temporal Action Localization.
IEEE Trans. Image Process., 2022

Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey.
Neurocomputing, 2022

MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding.
CoRR, 2022

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation.
CoRR, 2022

Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World.
CoRR, 2022

Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives.
CoRR, 2022

Multimodal Few-Shot Object Detection with Meta-Learning Based Cross-Modal Prompting.
CoRR, 2022

Respecting Transfer Gap in Knowledge Distillation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rethinking the Reference-based Distinctive Image Captioning.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Correspondence Matters for Video Referring Expression Comprehension.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Weakly-Supervised Temporal Article Grounding.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Explicit Image Caption Editing.
Proceedings of the Computer Vision - ECCV 2022, 2022

Rethinking Data Augmentation for Robust Visual Question Answering.
Proceedings of the Computer Vision - ECCV 2022, 2022

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Few-Shot Object Detection with Fully Cross-Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Rethinking the Evaluation of Unbiased Scene Graph Generation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Rethinking the Two-Stage Framework for Grounded Situation Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.
CoRR, 2021

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey.
CoRR, 2021

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching.
CoRR, 2021

A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics.
CoRR, 2021

A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric.
Proceedings of the HUMA'21: Proceedings of the 2nd International Workshop on Human-centric Multimedia Analysis, 2021

Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Video Relation Detection via Tracklet based Visual Transformer.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework.
Proceedings of the 38th International Conference on Machine Learning, 2021

Natural Language Video Localization with Learnable Moment Proposals.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

On Pursuit of Designing Multi-modal Transformer for Video Grounding.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.
Proceedings of the Artificial Intelligence - First CAAI International Conference, 2021

Boundary Proposal Network for Two-stage Natural Language Video Localization.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering.
Neural Process. Lett., 2020

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding.
CoRR, 2020

Hierarchical Fashion Graph Network for Personalized Outfit Recommendation.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Rethinking the Bottom-Up Framework for Query-Based Video Localization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Learning Using Privileged Information for Food Recognition.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Counterfactual Critic Multi-Agent Training for Scene Graph Generation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
Scene Dynamics: Counterfactual Critic Multi-Agent Training for Scene Graph Generation.
CoRR, 2018

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Network.
CoRR, 2017

Video Question Answering via Attribute-Augmented Attention Network Learning.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning.
CoRR, 2016


  Loading...