2025
Cosmos World Foundation Model Platform for Physical AI.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
2024
Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale.
CoRR, 2024
Edify 3D: Scalable High-Quality 3D Asset Generation.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation.
CoRR, 2024
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling.
CoRR, 2024
Wolf: Captioning Everything with a World Summarization Framework.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Condition-Aware Neural Network for Controlled Image Generation.
CoRR, 2024
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models.
CoRR, 2024
ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Condition-Aware Neural Network for Controlled Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
Loss-Guided Diffusion Models for Plug-and-Play Controllable Generation.
Proceedings of the International Conference on Machine Learning, 2023
ATT3D: Amortized Text-to-3D Object Synthesis.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
SPACE: Speech-driven Portrait Animation with Controllable Expression.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
DiffCollage: Parallel Generation of Large Content with Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Magic3D: High-Resolution Text-to-3D Content Creation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Neuralangelo: High-Fidelity Neural Surface Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Learning to Relight Portrait Images via a Virtual Light Stage and Synthetic-to-Real Adaptation.
ACM Trans. Graph., 2022
LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update.
IEEE Trans. Computers, 2022
SPACEx: Speech-driven Portrait Animation with Controllable Expression.
CoRR, 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Implicit Warping for Animation with Image Sets.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Implicit Neural Representations with Levels-of-Experts.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Generating Long Videos of Dynamic Scenes.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Multimodal Conditional Image Synthesis with Product-of-Experts GANs.
Proceedings of the Computer Vision - ECCV 2022, 2022
2021
Generative Adversarial Networks for Image and Video Synthesis: Algorithms and Applications.
Proc. IEEE, 2021
Domain Stylization: A Fast Covariance Matching Framework Towards Domain Adaptation.
IEEE Trans. Pattern Anal. Mach. Intell., 2021
Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update.
CoRR, 2021
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation.
IEEE Trans. Pattern Anal. Mach. Intell., 2020
Guest Editorial: Generative Adversarial Networks for Computer Vision.
Int. J. Comput. Vis., 2020
UFO$^2$: A Unified Framework towards Omni-supervised Object Detection.
CoRR, 2020
Style Example-Guided Text Generation using Generative Adversarial Transformers.
CoRR, 2020
SymGAN: Orientation Estimation without Annotation for Symmetric Objects.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020
Learning compositional functions via multiplicative weight updates.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
On the distance between two neural networks and the stability of learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder.
Proceedings of the Computer Vision - ECCV 2020, 2020
UFO<sup>2</sup>: A Unified Framework Towards Omni-supervised Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020
World-Consistent Video-to-Video Synthesis.
Proceedings of the Computer Vision - ECCV 2020, 2020
UNAS: Differentiable Architecture Search Meets Reinforcement Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Learning to Generate Multiple Style Transfer Outputs for an Input Sentence.
Proceedings of the Fourth Workshop on Neural Generation and Translation, 2020
2019
Boosting segmentation with weak supervision from image-to-image translation.
CoRR, 2019
Few-shot Video-to-Video Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Meta-Sim: Learning to Generate Synthetic Datasets.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Neural Turtle Graphics for Modeling City Road Layouts.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Few-Shot Unsupervised Image-to-Image Translation.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
STEP: Spatio-Temporal Progressive Learning for Video Action Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Semantic Image Synthesis With Spatially-Adaptive Normalization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Unsupervised Stylish Image Description Generation via Domain Layer Norm.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
2018
Video-to-Video Synthesis.
CoRR, 2018
Domain Stylization: A Strong, Simple Baseline for Synthetic to Real Image Domain Adaptation.
CoRR, 2018
Video-to-Video Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Context-aware Synthesis and Placement of Object Instances.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Reblur2Deblur: Deblurring videos via self-supervised learning.
Proceedings of the 2018 IEEE International Conference on Computational Photography, 2018
A Closed-Form Solution to Photorealistic Image Stylization.
Proceedings of the Computer Vision - ECCV 2018, 2018
Superpixel Sampling Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018
Multimodal Unsupervised Image-to-Image Translation.
Proceedings of the Computer Vision - ECCV 2018, 2018
High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
MoCoGAN: Decomposing Motion and Content for Video Generation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
Learning Superpixels With Segmentation-Aware Affinity Loss.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
The 2018 NVIDIA AI City Challenge.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018
Localization-Aware Active Learning for Object Detection.
Proceedings of the Computer Vision - ACCV 2018, 2018
Learning Binary Residual Representations for Domain-Specific Video Streaming.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
2017
Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight.
CoRR, 2017
Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video.
CoRR, 2017
Attentional Network for Visual Object Detection.
CoRR, 2017
Unsupervised Image-to-Image Translation Networks.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017
CASENet: Deep Category-Aware Semantic Edge Detection.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
2016
Automatic Learning to Remove Multipath Distortions in Time-of-Flight Range Images for a Robotic Arm Setup.
CoRR, 2016
Unsupervised network pretraining via encoding human design.
Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision, 2016
Coupled Generative Adversarial Networks.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016
Learning to remove multipath distortions in Time-of-Flight range images for a robotic arm setup.
Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016
Gaussian Conditional Random Field Network for Semantic Segmentation.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
Deep Gaussian Conditional Random Field Network: A Model-Based Deep Network for Discriminative Denoising.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
R-CNN for Small Object Detection.
Proceedings of the Computer Vision - ACCV 2016, 2016
2015
Unsupervised Deep Network Pretraining via Human Design.
CoRR, 2015
Layered Interpretation of Street View Images.
Proceedings of the Robotics: Science and Systems XI, Sapienza University of Rome, 2015
2014
Entropy-Rate Clustering: Cluster Analysis via Maximizing a Submodular Function Subject to a Matroid Constraint.
IEEE Trans. Pattern Anal. Mach. Intell., 2014
Recursive Context Propagation Network for Semantic Scene Labeling.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014
Learning to Rank 3D Features.
Proceedings of the Computer Vision - ECCV 2014, 2014
2013
Joint Geodesic Upsampling of Depth Images.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013
Model-Based Vehicle Pose Estimation and Tracking in Videos Using Random Forests.
Proceedings of the 2013 International Conference on 3D Vision, 2013
2012
Discrete Optimization Methods for Segmentation and Matching.
PhD thesis, 2012
Fast object localization and pose estimation in heavy clutter for robotic bin picking.
Int. J. Robotics Res., 2012
Voting-based pose estimation for robotic assembly using a 3D sensor.
Proceedings of the IEEE International Conference on Robotics and Automation, 2012
A Grassmann manifold-based domain adaptation approach.
Proceedings of the 21st International Conference on Pattern Recognition, 2012
Classification and Pose Estimation of Vehicles in Videos by 3D Modeling within Discrete-Continuous Optimization.
Proceedings of the 2012 Second International Conference on 3D Imaging, 2012
2011
Entropy rate superpixel segmentation.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011
2010
Pose estimation in heavy clutter using a multi-flash camera.
Proceedings of the IEEE International Conference on Robotics and Automation, 2010
Fast directional chamfer matching.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010