Chuang Gan

Orcid: 0000-0003-4031-5886

According to our database1, Chuang Gan authored at least 229 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
UniMuMo: Unified Text, Music and Motion Generation.
CoRR, 2024

Compositional Physical Reasoning of Objects and Events from Videos.
CoRR, 2024

Disentangled Acoustic Fields For Multimodal Physical Scene Understanding.
CoRR, 2024

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs.
CoRR, 2024

CoNav: A Benchmark for Human-Centered Collaborative Navigation.
CoRR, 2024

Physically Compatible 3D Object Modeling from a Single Image.
CoRR, 2024

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text.
CoRR, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving.
CoRR, 2024

Virtual Foundry Graphnet for Metal Sintering Deformation Prediction.
CoRR, 2024

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation.
CoRR, 2024

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision.
CoRR, 2024

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble.
CoRR, 2024

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

RoboDreamer: Learning Compositional World Models for Robot Imagination.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ContPhy: Continuum Physical Concept Learning and Reasoning from Videos.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

3D-VLA: A 3D Vision-Language-Action Generative World Model.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Speech Self-Supervised Learning Using Diffusion Model Synthetic Data.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Building Cooperative Embodied Agents Modularly with Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Thin-Shell Object Manipulations With Differentiable Physics Simulations.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SALMON: Self-Alignment with Instructable Reward Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DIFFTACTILE: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

GENOME: Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

FlexAttention for Efficient High-Resolution Vision-Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Multi-Agent Alternate Q-Learning.
Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

Aligning Large Multimodal Models with Factually Augmented RLHF.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Bird-Count: a multi-modality benchmark and system for bird population counting in the wild.
Multim. Tools Appl., December, 2023

TransCenter: Transformers With Dense Representations for Multiple-Object Tracking.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Self-supervised audiovisual representation learning for remote sensing data.
Int. J. Appl. Earth Obs. Geoinformation, February, 2023

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning.
CoRR, 2023

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation.
CoRR, 2023

Autonomous Tree-search Ability of Large Language Models.
CoRR, 2023

SALMON: Self-Alignment with Principle-Following Reward Models.
CoRR, 2023

Generalizable Long-Horizon Manipulations with Large Language Models.
CoRR, 2023

A<sup>2</sup>Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models.
CoRR, 2023

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training.
CoRR, 2023

ModuleFormer: Learning Modular Large Language Models From Uncurated Data.
CoRR, 2023

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models.
CoRR, 2023

EC^2: Emergent Communication for Embodied Control.
CoRR, 2023

See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning.
CoRR, 2023

ClawSAT: Towards Both Robust and Accurate Code Models.
Proceedings of the IEEE International Conference on Software Analysis, 2023

RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects.
Proceedings of the Robotics: Science and Systems XIX, Daegu, 2023

Adaptive Online Replanning with Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

3D-LLM: Injecting the 3D World into Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PockEngine: Sparse and Efficient Fine-tuning in a Pocket.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Deep Masked Graph Matching for Correspondence Identification in Collaborative Perception.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Reparameterized Policy Learning for Multimodal Trajectory Optimization.
Proceedings of the International Conference on Machine Learning, 2023

On the Forward Invariance of Neural ODEs.
Proceedings of the International Conference on Machine Learning, 2023

Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics.
Proceedings of the International Conference on Machine Learning, 2023

Planning with Large Language Models for Code Generation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Hyper-Decision Transformer for Efficient Online Policy Adaptation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Vision-and-Language Navigation from YouTube Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Enabling Encrypted Delta Compression for Outsourced Storage Systems via Preserving Similarity.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

Sparse Universal Transformer.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Masked Motion Encoding for Self-Supervised Video Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EC<sup>2</sup>: Emergent Communication for Embodied Control.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Situation Hyper-Graphs for Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

3D Concept Learning and Reasoning from Multi-View Images.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Text-instance graph: Exploring the relational semantics for text-based visual question answering.
Pattern Recognit., 2022

Graph Convolutional Module for Temporal Action Localization in Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Purely Attention Based Local Feature Integration for Video Classification.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners.
CoRR, 2022

Retrospectives on the Embodied AI Workshop.
CoRR, 2022

M<sup>3</sup>Video: Masked Motion Modeling for Self-Supervised Video Representation Learning.
CoRR, 2022

MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning.
CoRR, 2022

EfficientViT: Enhanced Linear Attention for High-Resolution Low-Computation Visual Recognition.
CoRR, 2022

Certifiably robust interpretation via Rényi differential privacy.
Artif. Intell., 2022

SNAKE: Shape-aware Neural 3D Keypoint Field.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Neural Acoustic Fields.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

3D Concept Grounding on Neural Fields.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Physical Dynamics with Subequivariant Graph Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Active Camera for Multi-Object Navigation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On-Device Training Under 256KB Memory.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Gait Recognition in the Wild with Multi-hop Temporal Switch.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Noisy Agents: Self-supervised Exploration by Predicting Auditory Events.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark Towards Physically Realistic Embodied AI.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Prompting Decision Transformer for Few-Shot Policy Generalization.
Proceedings of the International Conference on Machine Learning, 2022

Linking Emergent and Natural Languages via Corpus Transfer.
Proceedings of the Tenth International Conference on Learning Representations, 2022

FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations.
Proceedings of the Tenth International Conference on Learning Representations, 2022

DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics.
Proceedings of the Tenth International Conference on Learning Representations, 2022

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Network Augmentation for Tiny Deep Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Revisiting the Roles of "Text" in Text Games.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Weakly Supervised Grounding for VQA in Vision-Language Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Finding Fallen Objects Via Asynchronous Audio-Visual Integration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation.
Proceedings of the Conference on Robot Learning, 2022

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following.
Proceedings of the Conference on Robot Learning, 2022

2021
A Real-Time Action Representation With Temporal Encoding and Deep Compression.
IEEE Trans. Circuits Syst. Video Technol., 2021

A novel domain activation mapping-guided network (DA-GNT) for visual tracking.
Neurocomputing, 2021

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning.
CoRR, 2021

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device.
CoRR, 2021

Global Rhythm Style Transfer Without Text Transcriptions.
CoRR, 2021

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning.
CoRR, 2021

TransCenter: Transformers with Dense Queries for Multiple-Object Tracking.
CoRR, 2021

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI.
CoRR, 2021

The 1st International Workshop on Machine Reasoning: International Machine Reasoning Conference (MRC 2021).
Proceedings of the WSDM '21, 2021

STAR: A Benchmark for Situated Reasoning in Real-World Videos.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Memory-efficient Patch-based Inference for Tiny Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021


When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Counterfactual Debiasing Inference for Compositional Action Recognition.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

OPEn: An Open-ended Physics Environment for Learning Without a Task.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

MAGCN: A Multi-Adaptive Graph Convolutional Network for Traffic Forecasting.
Proceedings of the International Joint Conference on Neural Networks, 2021

Temporal and Object Quantification Networks.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

AGENT: A Benchmark for Core Psychological Reasoning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Global Prosody Style Transfer Without Text Transcriptions.
Proceedings of the 38th International Conference on Machine Learning, 2021

Adversarial Option-Aware Hierarchical Imitation Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Learning Task Decomposition with Ordered Memory Policy Network.
Proceedings of the 9th International Conference on Learning Representations, 2021

PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics.
Proceedings of the 9th International Conference on Learning Representations, 2021

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning.
Proceedings of the 9th International Conference on Learning Representations, 2021

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Curious Representation Learning for Embodied Intelligence.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Augmenting Policy Learning with Routines Discovered from a Single Demonstration.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

MVFNet: Multi-View Fusion Network for Efficient Video Recognition.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Relation Attention for Temporal Action Localization.
IEEE Trans. Multim., 2020

Generating Visually Aligned Sound From Videos.
IEEE Trans. Image Process., 2020

Augmenting Policy Learning with Routines Discovered from a Demonstration.
CoRR, 2020

Object-Centric Diagnosis of Visual Reasoning.
CoRR, 2020

Synthetic Training for Monocular Human Mesh Recovery.
CoRR, 2020

Tiny Transfer Learning: Towards Memory-Efficient On-Device Learning.
CoRR, 2020

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation.
CoRR, 2020

Language Guided Networks for Cross-modal Moment Retrieval.
CoRR, 2020

MCUNet: Tiny Deep Learning on IoT Devices.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

HUMA'20: 1st International Workshop on Human-Centric Multimedia Analysis.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Deep Concept-wise Temporal Convolutional Networks for Action Localization.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Deep Audio Priors Emerge From Harmonic Convolutional Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

CLEVRER: Collision Events for Video Representation and Reasoning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Once-for-All: Train One Network and Specialize it for Efficient Deployment.
Proceedings of the 8th International Conference on Learning Representations, 2020

Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

DataMix: Efficient Privacy-Preserving Edge-Cloud Inference.
Proceedings of the Computer Vision - ECCV 2020, 2020

Foley Music: Learning to Generate Music from Videos.
Proceedings of the Computer Vision - ECCV 2020, 2020

Dense Regression Network for Video Grounding.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Music Gesture for Visual Sound Separation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Location-Aware Graph Convolutional Networks for Video Question Answering.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Breaking Winner-Takes-All: Iterative-Winners-Out Networks for Weakly Supervised Temporal Action Localization.
IEEE Trans. Image Process., 2019

Toward Efficient Action Recognition: Principal Backpropagation for Training Two-Stream Networks.
IEEE Trans. Image Process., 2019

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation.
CoRR, 2019

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos.
CoRR, 2019

Once for All: Train One Network and Specialize it for Efficient Deployment.
CoRR, 2019

Interpreting Adversarial Examples by Activation Promotion and Suppression.
CoRR, 2019

Cross-channel Communication Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Visual Concept-Metaconcept Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Facial Image-to-Video Translation by a Hidden Affine Transformation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Watch, Reason and Code: Learning to Represent Videos Using Program.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision.
Proceedings of the 7th International Conference on Learning Representations, 2019

Defensive Quantization: When Efficiency Meets Robustness.
Proceedings of the 7th International Conference on Learning Representations, 2019

The Sound of Motions.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Graph Convolutional Networks for Temporal Action Localization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

TSM: Temporal Shift Module for Efficient Video Understanding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-Supervised Moving Vehicle Tracking With Stereo Sound.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-supervised Audio-visual Co-segmentation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Self-Supervised Segmentation and Source Separation on Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Video Captioning with Multi-Faceted Attention.
Trans. Assoc. Comput. Linguistics, 2018

Temporal Shift Module for Efficient Video Understanding.
CoRR, 2018

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Weakly Supervised Dense Event Captioning in Videos.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency.
Proceedings of the Computer Vision - ECCV 2018, 2018

The Sound of Pixels.
Proceedings of the Computer Vision - ECCV 2018, 2018

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

End-to-End Learning of Motion Representation for Video Understanding.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Sparse, Smart Contours to Represent and Edit Images.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Multimodal Keyless Attention Fusion for Video Classification.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
A Multisource Domain Generalization Approach to Visual Attribute Detection.
Proceedings of the Domain Adaptation in Computer Vision Applications., 2017

Smart, Sparse Contours to Represent and Edit Images.
CoRR, 2017

Unsupervised Domain Adaptation for 3D Keypoint Prediction from a Single Depth Scan.
CoRR, 2017

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification.
CoRR, 2017

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding.
CoRR, 2017

Recurrent Topic-Transition GAN for Visual Paragraph Generation.
Proceedings of the IEEE International Conference on Computer Vision, 2017

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Semantic Compositional Networks for Visual Captioning.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

StyleNet: Generating Attractive Visual Captions with Styles.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

DECK: Discovering Event Composition Knowledge from Web Images for Zero-Shot Event Detection and Recounting in Videos.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Recognizing an Action Using Its Name: A Knowledge-Based Approach.
Int. J. Comput. Vis., 2016

Strategies for Searching Video Content with Text Queries or Video Examples.
CoRR, 2016

Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames.
Proceedings of the Computer Vision - ECCV 2016, 2016

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Learning Attributes Equals Multi-Source Domain Generalization.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Automatic Concept Discovery from Parallel Text and Visual Corpora.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

DevNet: A Deep Event Network for multimedia event detection and evidence recounting.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

2013
Salient object detection in image sequences via spatial-temporal cue.
Proceedings of the 2013 Visual Communications and Image Processing, 2013


  Loading...