2025
Embodied Scene Understanding for Vision Language Models via MetaVQA.
CoRR, January, 2025
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation.
CoRR, January, 2025
Joint Optimization for 4D Human-Scene Reconstruction in the Wild.
CoRR, January, 2025
2024
Spatial Steerability of GANs via Self-Supervision from Discriminator.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024
Experiment-free exoskeleton assistance via learning in simulation.
,
,
,
,
,
,
,
,
,
,
,
Nat., June, 2024
In-Domain GAN Inversion for Faithful Reconstruction and Editability.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024
Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled.
Trans. Mach. Learn. Res., 2024
Street-View Image Generation From a Bird's-Eye View Layout.
IEEE Robotics Autom. Lett., 2024
Hyper: Hyperparameter Robust Efficient Exploration in Reinforcement Learning.
CoRR, 2024
V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Verbalized Representation Learning for Interpretable Few-Shot Generalization.
CoRR, 2024
Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels.
CoRR, 2024
CooPre: Cooperative Pretraining for V2X Cooperative Perception.
CoRR, 2024
MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces.
CoRR, 2024
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting.
CoRR, 2024
Urban Scene Diffusion through Semantic Occupancy Map.
CoRR, 2024
A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation.
CoRR, 2024
SimGen: Simulator-conditioned Driving Scene Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Shared Autonomy with IDA: Interventional Diffusion Assistance.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Towards Text-guided 3D Scene Composition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Efficient 3D Articulated Human Generation with Layered Surface Volumes.
Proceedings of the International Conference on 3D Vision, 2024
2023
GH-Feat: Learning Versatile Generative Hierarchical Features From GANs.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023
ChemSpacE: Interpretable and Interactive Chemical Space Exploration.
Trans. Mach. Learn. Res., 2023
SceneWiz3D: Towards Text-guided 3D Scene Composition.
CoRR, 2023
Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation.
CoRR, 2023
Next Steps for Human-Centered Generative AI: A Technical Perspective.
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Spatial Steerability of GANs via Self-Supervision from Discriminator.
CoRR, 2023
Learning from Active Human Involvement through Proxy Value Propagation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
V2XP-ASG: Generating Adversarial Scenes for Vehicle-to-Everything Perception.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023
TrafficGen: Learning to Generate Diverse and Realistic Traffic Scenarios.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023
Towards Smooth Video Composition.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Guarded Policy Optimization with Imperfect Online Demonstrations.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
One-Shot Generative Domain Adaptation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis.
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
CAT: Closed-loop Adversarial Training for Safe End-to-End Driving.
Proceedings of the Conference on Robot Learning, 2023
2022
PlaTe: Visually-Grounded Planning With Transformers in Procedural Tasks.
IEEE Robotics Autom. Lett., 2022
InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs.
IEEE Trans. Pattern Anal. Mach. Intell., 2022
Disentangled Inference for GANs With Latently Invertible Autoencoder.
Int. J. Comput. Vis., 2022
Exploiting Reward Shifting in Value-Based Deep RL.
CoRR, 2022
Human-AI Shared Control via Frequency-based Policy Dissection.
CoRR, 2022
Action-Conditioned Contrastive Policy Pretraining.
CoRR, 2022
LocATe: End-to-end Localization of Actions in 3D with Transformers.
CoRR, 2022
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation.
CoRR, 2022
Improving GANs with A Dynamic Discriminator.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Human-AI Shared Control via Policy Dissection.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022
Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining.
Proceedings of the Computer Vision - ECCV 2022, 2022
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
3D-aware Image Synthesis via Learning Structural and Textural Representations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Improving GAN Equilibrium by Raising Spatial Awareness.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers.
Proceedings of the Conference on Robot Learning, 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-training for Spatial-Aware Visual Representations.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
Texture Memory-Augmented Deep Patch-Based Image Inpainting.
IEEE Trans. Image Process., 2021
Adversarial Inverse Reinforcement Learning With Self-Attention Dynamics Model.
IEEE Robotics Autom. Lett., 2021
Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis.
Int. J. Comput. Vis., 2021
STransGAN: An Empirical Study on Transformer in GANs.
CoRR, 2021
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning.
CoRR, 2021
Safe Exploration by Solving Early Terminated MDP.
CoRR, 2021
Unsupervised Image Transformation Learning via Generative Adversarial Networks.
CoRR, 2021
Deep Learning for Scene Classification: A Survey.
CoRR, 2021
Data-Efficient Instance Generation from Instance Discrimination.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Instance Localization for Self-Supervised Detection Pretraining.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Positional Encoding As Spatial Inductive Bias in GANs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Generative Hierarchical Features From Synthesizing Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Closed-Form Factorization of Latent Semantics in GANs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Multimodal Motion Prediction With Stacked Transformers.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Safe Driving via Expert Guided Policy Optimization.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021
HiABP: Hierarchical Initialized ABP for Unsupervised Representation Learning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Cross-View Semantic Segmentation for Sensing Surroundings.
IEEE Robotics Autom. Lett., 2020
Understanding the role of individual units in a deep neural network.
Proc. Natl. Acad. Sci. USA, 2020
Moments in Time Dataset: One Million Videos for Event Understanding.
,
,
,
,
,
,
,
,
,
,
IEEE Trans. Pattern Anal. Mach. Intell., 2020
Improving the Generalization of End-to-End Driving through Procedural Generation.
CoRR, 2020
Improving the Fairness of Deep Generative Models without Retraining.
CoRR, 2020
Unsupervised Landmark Learning from Unpaired Data.
CoRR, 2020
Video Representation Learning with Visual Tempo Consistency.
CoRR, 2020
Non-local Policy Optimization via Diversity-regularized Collaborative Exploration.
CoRR, 2020
Zeroth-Order Supervised Policy Improvement.
CoRR, 2020
Novel Policy Seeking with Constrained Optimization.
CoRR, 2020
Evolutionary Stochastic Policy Distillation.
CoRR, 2020
Interpreting Generative Adversarial Networks for Interactive Image Generation.
Proceedings of the xxAI - Beyond Explainable AI, 2020
In-Domain GAN Inversion for Real Image Editing.
Proceedings of the Computer Vision - ECCV 2020, 2020
A Unified Framework for Shot Type Classification Based on Subject Centric Lens.
Proceedings of the Computer Vision - ECCV 2020, 2020
TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Temporal Pyramid Network for Action Recognition.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Interpreting the Latent Space of GANs for Semantic Face Editing.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Image Processing Using Multi-Code GAN Prior.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
TPNet: Trajectory Proposal Network for Motion Prediction.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design.
Proceedings of the 4th Conference on Robot Learning, 2020
Learning a Decision Module by Imitating Driver's Control Behaviors.
Proceedings of the 4th Conference on Robot Learning, 2020
Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Comparing the Interpretability of Deep Networks via Network Dissection.
Proceedings of the Explainable AI: Interpreting, 2019
Semantic photo manipulation with a generative image prior.
ACM Trans. Graph., 2019
Interpreting Deep Visual Representations via Network Dissection.
IEEE Trans. Pattern Anal. Mach. Intell., 2019
Semantic Understanding of Scenes Through the ADE20K Dataset.
Int. J. Comput. Vis., 2019
Learning Driving Decisions by Imitating Drivers' Control Behaviors.
CoRR, 2019
Cross-view Semantic Segmentation for Sensing Surroundings.
CoRR, 2019
Visualizing and Understanding Generative Adversarial Networks (Extended Abstract).
CoRR, 2019
Proceedings of AAAI 2019 Workshop on Network Interpretability for Deep Learning.
CoRR, 2019
Policy Continuation with Hindsight Inverse Dynamics.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Visualizing and Understanding GANs.
Proceedings of the Deep Generative Models for Highly Structured Data, 2019
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks.
Proceedings of the 7th International Conference on Learning Representations, 2019
A Graph-Based Framework to Bridge Movies and Synopses.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Reasoning About Human-Object Interactions Through Dual Attention Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Seeing What a GAN Cannot Generate.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Deep Flow-Guided Video Inpainting.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
Interpretable representation learning for visual intelligence.
PhD thesis, 2018
Places: A 10 Million Image Database for Scene Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2018
FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis.
CoRR, 2018
Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation.
CoRR, 2018
Revisiting the Importance of Individual Units in CNNs via Ablation.
CoRR, 2018
DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation.
CoRR, 2018
Expert identification of visual primitives used by CNNs during mammogram classification.
Proceedings of the Medical Imaging 2018: Computer-Aided Diagnosis, 2018
Real-Time Object Pose Estimation with Pose Interpreter Networks.
Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018
Interpretable Basis Decomposition for Visual Explanation.
Proceedings of the Computer Vision - ECCV 2018, 2018
Temporal Relational Reasoning in Videos.
Proceedings of the Computer Vision - ECCV 2018, 2018
Unified Perceptual Parsing for Scene Understanding.
Proceedings of the Computer Vision - ECCV 2018, 2018
Single Image Intrinsic Decomposition Without a Single Intrinsic Image.
Proceedings of the Computer Vision - ECCV 2018, 2018
Factorizable Net: An Efficient Subgraph-Based Framework for Scene Graph Generation.
Proceedings of the Computer Vision - ECCV 2018, 2018
Recurrent Residual Module for Fast Inference in Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
Visual Question Generation as Dual Task of Visual Question Answering.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
2017
Temporal Relational Reasoning in Videos.
CoRR, 2017
Visual Question Generation as Dual Task of Visual Question Answering.
CoRR, 2017
Scene Graph Generation from Objects, Phrases and Caption Regions.
CoRR, 2017
SegICP: Integrated deep semantic segmentation and pose estimation.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017
Open Vocabulary Scene Parsing.
Proceedings of the IEEE International Conference on Computer Vision, 2017
Scene Graph Generation from Objects, Phrases and Region Captions.
Proceedings of the IEEE International Conference on Computer Vision, 2017
Scene Parsing through ADE20K Dataset.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Person Search with Natural Language Description.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Network Dissection: Quantifying Interpretability of Deep Visual Representations.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
2016
Semantic Understanding of Scenes through the ADE20K Dataset.
CoRR, 2016
Places: An Image Database for Deep Scene Understanding.
CoRR, 2016
Learning Deep Features for Discriminative Localization.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
Optimization as Estimation with Gaussian Processes in Bandit Settings.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016
2015
Learning Collective Crowd Behaviors with Dynamic Pedestrian-Agents.
Int. J. Comput. Vis., 2015
Simple Baseline for Visual Question Answering.
CoRR, 2015
Object Detectors Emerge in Deep Scene CNNs.
Proceedings of the 3rd International Conference on Learning Representations, 2015
Understanding Intra-Class Knowledge Inside CNN.
CoRR, 2015
ConceptLearner: Discovering visual concepts from weakly labeled image collections.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015
2014
Measuring Crowd Collectiveness.
IEEE Trans. Pattern Anal. Mach. Intell., 2014
Learning Deep Features for Scene Recognition using Places Database.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014
Recognizing City Identity via Attribute Analysis of Geo-tagged Images.
Proceedings of the Computer Vision - ECCV 2014, 2014
2013
Measuring Crowd Collectiveness.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013
2012
Coherent Filtering: Detecting Coherent Motions from Crowd Clutters.
Proceedings of the Computer Vision - ECCV 2012, 2012
Understanding collective crowd behaviors: Learning a Mixture model of Dynamic pedestrian-Agents.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
2011
Modeling Manifold Ways of Scene Perception.
Proceedings of the Neural Information Processing - 18th International Conference, 2011
Random field topic model for semantic region analysis in crowded scenes from tracklets.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011
2010
A Phase Discrepancy Analysis of Object Motion.
Proceedings of the Computer Vision - ACCV 2010, 2010
2009
Scene Gist: A Holistic Generative Model of Natural Image.
Proceedings of the Computer Vision, 2009