Chuang Gan

Krishna Murthy Jatavallabhula

Evangelos Kalogerakis

Sergey Tulyakov

Hsin-Ying Lee

Chaoyang Wang

CoRR, 2024

SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

UniMuMo: Unified Text, Music and Motion Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Compositional Physical Reasoning of Objects and Events from Videos.

[BibT_eX]

[DOI]

CoRR, 2024

Disentangled Acoustic Fields For Multimodal Physical Scene Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs.

[BibT_eX]

[DOI]

Irene Huang

Wei Lin

Muhammad Jehanzeb Mirza

CoRR, 2024

CoNav: A Benchmark for Human-Centered Collaborative Navigation.

[BibT_eX]

[DOI]

CoRR, 2024

Physically Compatible 3D Object Modeling from a Single Image.

[BibT_eX]

[DOI]

CoRR, 2024

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text.

[BibT_eX]

[DOI]

CoRR, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving.

[BibT_eX]

[DOI]

CoRR, 2024

Virtual Foundry Graphnet for Metal Sintering Deformation Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation.

[BibT_eX]

[DOI]

CoRR, 2024

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble.

[BibT_eX]

[DOI]

CoRR, 2024

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning.

[BibT_eX]

[DOI]

Qiao Gu

Ali Kuwajerwala

Sacha Morin

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

RoboDreamer: Learning Compositional World Models for Robot Imagination.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

ContPhy: Continuum Physical Concept Learning and Reasoning from Videos.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

3D-VLA: A 3D Vision-Language-Action Generative World Model.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Speech Self-Supervised Learning Using Diffusion Model Synthetic Data.

[BibT_eX]

[DOI]

Mark A. Hasegawa-Johnson

Shiyu Chang

Yang Zhang

Proceedings of the Forty-first International Conference on Machine Learning, 2024

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Building Cooperative Embodied Agents Modularly with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Thin-Shell Object Manipulations With Differentiable Physics Simulations.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SALMON: Self-Alignment with Instructable Reward Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

DIFFTACTILE: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

GENOME: Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

FlexAttention for Efficient High-Resolution Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance.

[BibT_eX]

[DOI]

Phuc D. A. Nguyen

Tuan Duc Ngo

Evangelos Kalogerakis

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Multi-Agent Alternate Q-Learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

Aligning Large Multimodal Models with Factually Augmented RLHF.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Bird-Count: a multi-modality benchmark and system for bird population counting in the wild.

[BibT_eX]

[DOI]

Multim. Tools Appl., December, 2023

TransCenter: Transformers With Dense Representations for Multiple-Object Tracking.

[BibT_eX]

[DOI]

Xavier Alameda-Pineda

IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Self-supervised audiovisual representation learning for remote sensing data.

[BibT_eX]

[DOI]

Int. J. Appl. Earth Obs. Geoinformation, February, 2023

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation.

[BibT_eX]

[DOI]

CoRR, 2023

Autonomous Tree-search Ability of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

SALMON: Self-Alignment with Principle-Following Reward Models.

[BibT_eX]

[DOI]

CoRR, 2023

Generalizable Long-Horizon Manipulations with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2023

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training.

[BibT_eX]

[DOI]

Erik G. Learned-Miller

Krishna Murthy Jatavallabhula

CoRR, 2023

ModuleFormer: Learning Modular Large Language Models From Uncurated Data.

[BibT_eX]

[DOI]

CoRR, 2023

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models.

[BibT_eX]

[DOI]

CoRR, 2023

EC^2: Emergent Communication for Embodied Control.

[BibT_eX]

[DOI]

CoRR, 2023

See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

ClawSAT: Towards Both Robust and Accurate Code Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Software Analysis, 2023

RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects.

[BibT_eX]

[DOI]

Proceedings of the Robotics: Science and Systems XIX, Daegu, 2023

Adaptive Online Replanning with Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models.

[BibT_eX]

[DOI]

Tsun-Hsuan Johnson Wang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

3D-LLM: Injecting the 3D World into Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PockEngine: Sparse and Efficient Fine-tuning in a Pocket.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Deep Masked Graph Matching for Correspondence Identification in Collaborative Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Reparameterized Policy Learning for Multimodal Trajectory Optimization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

On the Forward Invariance of Neural ODEs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Planning with Large Language Models for Code Generation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Hyper-Decision Transformer for Efficient Online Policy Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments.

[BibT_eX]

[DOI]

Tsun-Hsuan Wang

Pingchuan Ma

Andrew Everett Spielberg

Proceedings of the Eleventh International Conference on Learning Representations, 2023

PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification.

[BibT_eX]

[DOI]

Xuan Li

Yi-Ling Qiao

Peter Yichen Chen

Ming C. Lin

Chenfanfu Jiang

Proceedings of the Eleventh International Conference on Learning Representations, 2023

DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Vision-and-Language Navigation from YouTube Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Enabling Encrypted Delta Compression for Outsourced Storage Systems via Preserving Similarity.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE International Conference on Computer Design, 2023

Sparse Universal Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Masked Motion Encoding for Self-Supervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EC2: Emergent Communication for Embodied Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Situation Hyper-Graphs for Video Question Answering.

[BibT_eX]

[DOI]

Niels da Vitoria Lobo

Mubarak Shah

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

3D Concept Learning and Reasoning from Multi-View Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners.

[BibT_eX]

[DOI]

Erik G. Learned-Miller

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Text-instance graph: Exploring the relational semantics for text-based visual question answering.

[BibT_eX]

[DOI]

Pattern Recognit., 2022

Graph Convolutional Module for Temporal Action Localization in Videos.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Purely Attention Based Local Feature Integration for Video Classification.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Devices.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners.

[BibT_eX]

[DOI]

Erik G. Learned-Miller

CoRR, 2022

Retrospectives on the Embodied AI Workshop.

[BibT_eX]

[DOI]

CoRR, 2022

M3Video: Masked Motion Modeling for Self-Supervised Video Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

EfficientViT: Enhanced Linear Attention for High-Resolution Low-Computation Visual Recognition.

[BibT_eX]

[DOI]

Han Cai

CoRR, 2022

Certifiably robust interpretation via Rényi differential privacy.

[BibT_eX]

[DOI]

Artif. Intell., 2022

SNAKE: Shape-aware Neural 3D Keypoint Field.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Neural Acoustic Fields.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

3D Concept Grounding on Neural Fields.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Physical Dynamics with Subequivariant Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Active Camera for Multi-Object Navigation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On-Device Training Under 256KB Memory.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Gait Recognition in the Wild with Multi-hop Temporal Switch.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Noisy Agents: Self-supervised Exploration by Predicting Auditory Events.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark Towards Physically Realistic Embodied AI.

[BibT_eX]

[DOI]

Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Prompting Decision Transformer for Few-Shot Policy Generalization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Linking Emergent and Natural Languages via Corpus Transfer.

[BibT_eX]

[DOI]

Shunyu Yao

Mo Yu

Yang Zhang

Karthik R. Narasimhan

Joshua B. Tenenbaum

Proceedings of the Tenth International Conference on Learning Representations, 2022

FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Network Augmentation for Tiny Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Revisiting the Roles of "Text" in Text Games.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Weakly Supervised Grounding for VQA in Vision-Language Transformers.

[BibT_eX]

[DOI]

Aisha Urooj Khan

Hilde Kuehne

Niels da Vitoria Lobo

Mubarak Shah

Proceedings of the Computer Vision - ECCV 2022, 2022

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Finding Fallen Objects Via Asynchronous Audio-Visual Integration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2022

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2022

2021

A Real-Time Action Representation With Temporal Encoding and Deep Compression.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2021

A novel domain activation mapping-guided network (DA-GNT) for visual tracking.

[BibT_eX]

[DOI]

Neurocomputing, 2021

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2021

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device.

[BibT_eX]

[DOI]

CoRR, 2021

Global Rhythm Style Transfer Without Text Transcriptions.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

CoRR, 2021

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning.

[BibT_eX]

[DOI]

CoRR, 2021

TransCenter: Transformers with Dense Queries for Multiple-Object Tracking.

[BibT_eX]

[DOI]

Xavier Alameda-Pineda

CoRR, 2021

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI.

[BibT_eX]

[DOI]

CoRR, 2021

The 1st International Workshop on Machine Reasoning: International Machine Reasoning Conference (MRC 2021).

[BibT_eX]

[DOI]

Proceedings of the WSDM '21, 2021

STAR: A Benchmark for Situated Reasoning in Real-World Videos.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Memory-efficient Patch-based Inference for Tiny Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Counterfactual Debiasing Inference for Compositional Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

OPEn: An Open-ended Physics Environment for Learning Without a Task.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

MAGCN: A Multi-Adaptive Graph Convolutional Network for Traffic Forecasting.

[BibT_eX]

[DOI]

Qingyuan Zhan

Guixing Wu

Proceedings of the International Joint Conference on Neural Networks, 2021

Temporal and Object Quantification Networks.

[BibT_eX]

[DOI]

Leslie Pack Kaelbling

Tomer D. Ullman

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

AGENT: A Benchmark for Core Psychological Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Global Prosody Style Transfer Without Text Transcriptions.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Proceedings of the 38th International Conference on Machine Learning, 2021

Adversarial Option-Aware Hierarchical Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Learning Task Decomposition with Ordered Memory Policy Network.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning.

[BibT_eX]

[DOI]

Zhenfang Chen

Jiayuan Mao

Jiajun Wu

Kwan-Yee Kenneth Wong

Joshua B. Tenenbaum

Proceedings of the 9th International Conference on Learning Representations, 2021

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Curious Representation Learning for Embodied Intelligence.

[BibT_eX]

[DOI]

Yilun Du

Phillip Isola

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules.

[BibT_eX]

[DOI]

Niels da Vitoria Lobo

Mubarak Shah

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Augmenting Policy Learning with Routines Discovered from a Single Demonstration.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

MVFNet: Multi-View Fusion Network for Efficient Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Relation Attention for Temporal Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Generating Visually Aligned Sound From Videos.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Augmenting Policy Learning with Routines Discovered from a Demonstration.

[BibT_eX]

[DOI]

CoRR, 2020

Object-Centric Diagnosis of Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, 2020

Synthetic Training for Monocular Human Mesh Recovery.

[BibT_eX]

[DOI]

CoRR, 2020

Tiny Transfer Learning: Towards Memory-Efficient On-Device Learning.

[BibT_eX]

[DOI]

CoRR, 2020

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation.

[BibT_eX]

[DOI]

CoRR, 2020

Language Guided Networks for Cross-modal Moment Retrieval.

[BibT_eX]

[DOI]

CoRR, 2020

MCUNet: Tiny Deep Learning on IoT Devices.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

HUMA'20: 1st International Workshop on Human-Centric Multimedia Analysis.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Deep Concept-wise Temporal Convolutional Networks for Action Localization.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Deep Audio Priors Emerge From Harmonic Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

CLEVRER: Collision Events for Video Representation and Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Once-for-All: Train One Network and Specialize it for Efficient Deployment.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

DataMix: Efficient Privacy-Preserving Edge-Cloud Inference.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Foley Music: Learning to Generate Music from Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Dense Regression Network for Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Music Gesture for Visual Sound Separation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Location-Aware Graph Convolutional Networks for Video Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Breaking Winner-Takes-All: Iterative-Winners-Out Networks for Weakly Supervised Temporal Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

Toward Efficient Action Recognition: Principal Backpropagation for Training Two-Stream Networks.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation.

[BibT_eX]

[DOI]

CoRR, 2019

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos.

[BibT_eX]

[DOI]

CoRR, 2019

Once for All: Train One Network and Specialize it for Efficient Deployment.

[BibT_eX]

[DOI]

Han Cai

CoRR, 2019

Interpreting Adversarial Examples by Activation Promotion and Suppression.

[BibT_eX]

[DOI]

CoRR, 2019

Cross-channel Communication Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Visual Concept-Metaconcept Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Facial Image-to-Video Translation by a Hidden Affine Transformation.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Watch, Reason and Code: Learning to Represent Videos Using Program.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Defensive Quantization: When Efficiency Meets Robustness.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

The Sound of Motions.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Graph Convolutional Networks for Temporal Action Localization.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

TSM: Temporal Shift Module for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-Supervised Moving Vehicle Tracking With Stereo Sound.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-supervised Audio-visual Co-segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Self-Supervised Segmentation and Source Separation on Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Video Captioning with Multi-Faceted Attention.

[BibT_eX]

[DOI]

Xiang Long

Gerard de Melo

Trans. Assoc. Comput. Linguistics, 2018

Temporal Shift Module for Efficient Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2018

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Weakly Supervised Dense Event Captioning in Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

The Sound of Pixels.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

End-to-End Learning of Motion Representation for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Sparse, Smart Contours to Represent and Edit Images.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Multimodal Keyless Attention Fusion for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

A Multisource Domain Generalization Approach to Visual Attribute Detection.

[BibT_eX]

[DOI]

Tianbao Yang

Boqing Gong

Proceedings of the Domain Adaptation in Computer Vision Applications., 2017

Smart, Sparse Contours to Represent and Edit Images.

[BibT_eX]

[DOI]

CoRR, 2017

Unsupervised Domain Adaptation for 3D Keypoint Prediction from a Single Depth Scan.

[BibT_eX]

[DOI]

CoRR, 2017

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification.

[BibT_eX]

[DOI]

CoRR, 2017

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2017

Recurrent Topic-Transition GAN for Visual Paragraph Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Semantic Compositional Networks for Visual Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

StyleNet: Generating Attractive Visual Captions with Styles.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

DECK: Discovering Event Composition Knowledge from Web Images for Zero-Shot Event Detection and Recounting in Videos.

[BibT_eX]

[DOI]

Chen Sun

Ram Nevatia

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Recognizing an Action Using Its Name: A Knowledge-Based Approach.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2016

Strategies for Searching Video Content with Text Queries or Video Examples.

[BibT_eX]

[DOI]

Xingzhong Du

Xiaojun Chang

CoRR, 2016

Webly-Supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Learning Attributes Equals Multi-Source Domain Generalization.

[BibT_eX]

[DOI]

Tianbao Yang

Boqing Gong

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

CMU Informedia@TRECVID 2015: MED/SIN/LNK/SED.

[BibT_eX]

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Automatic Concept Discovery from Parallel Text and Visual Corpora.

[BibT_eX]

[DOI]

Chen Sun

Ram Nevatia

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

DevNet: A Deep Event Network for multimedia event detection and evidence recounting.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition.

[BibT_eX]

[DOI]