Chenliang Xu

CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

CoRR, 2024

Tri<sup>2</sup>-plane: Volumetric Avatar Reconstruction with Feature Pyramid.

[BibT_eX]

[DOI]

CoRR, 2024

Bag of Tricks to Boost Adversarial Transferability.

[BibT_eX]

[DOI]

CoRR, 2024

TextToon: Real-Time Text Toonify Head Avatar from Single Video.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EAGLE: Egocentric AGgregated Language-video Engine.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

One Forward is Enough for Neural Network Training via Likelihood Ratio Method.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Learning Audio Concepts from Counterfactual Natural Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Adaptive Super Resolution for One-Shot Talking-Head Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Tri<sup>2</sup>-plane: Thinking Head Avatar via Feature Pyramid.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Modeling and Driving Human Body Soundfields Through Acoustic Primitives.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Random Smooth-based Certified Defense against Text Adversarial Attack.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Learning to Transform Dynamically for Better Adversarial Transferability.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

High-Quality Visually-Guided Sound Separation from Diverse Categories.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

Rapid runtime learning by curating small datasets of high-quality items obtained from memory.

[BibT_eX]

[DOI]

PLoS Comput. Biol., October, 2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores.

[BibT_eX]

[DOI]

CoRR, 2023

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation.

[BibT_eX]

[DOI]

CoRR, 2023

MISAR: A Multimodal Instructional System with Augmented Reality.

[BibT_eX]

[DOI]

CoRR, 2023

Emotional Listener Portrait: Neural Listener Head Generation with Emotion.

[BibT_eX]

[DOI]

CoRR, 2023

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields.

[BibT_eX]

[DOI]

CoRR, 2023

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA.

[BibT_eX]

[DOI]

CoRR, 2023

Training Neural Networks without Backpropagation: A Deeper Dive into the Likelihood Ratio Method.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Adversarial Transferability with Scheduled Step Size and Dual Example.

[BibT_eX]

[DOI]

CoRR, 2023

PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others.

[BibT_eX]

[DOI]

Cristian Canton-Ferrer

Mark Ibrahim

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Egocentric Audio-Visual Object Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Face Forgery Detection via Symmetric Transformer.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Cross-modal Contrastive Distillation for Instructional Activity Anticipation.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

Discover and Mitigate Unknown Biases with Debiasing Alternate Networks.

[BibT_eX]

[DOI]

Zhiheng Li

Anthony Hoogs

Proceedings of the Computer Vision - ECCV 2022, 2022

Learning to Answer Questions in Dynamic Audio-Visual Scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Pose Flow Learning From Person Images for Pose Guided Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Structured and Consistent Multi-Layer Multi-Kernel Subtask Correction Filter Tracker.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2021

Anomaly Crossing: A New Method for Video Anomaly Detection as Cross-domain Few-shot Learning.

[BibT_eX]

[DOI]

CoRR, 2021

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing.

[BibT_eX]

[DOI]

CoRR, 2021

Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution.

[BibT_eX]

[DOI]

CoRR, 2021

Animated 3D human avatars from a single image with GAN-based texture inference.

[BibT_eX]

[DOI]

Comput. Graph., 2021

How to Make a BLT Sandwich? Learning VQA towards Understanding Web Instructional Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Improve CAM with Auto-adapted Segmentation and Co-supervised Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Learning to Generate Scene Graph from Natural Language Supervision.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Discover the Unknown Biased Attribute of an Image Classifier.

[BibT_eX]

[DOI]

Zhiheng Li

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Explaining Local, Global, And Higher-Order Interactions In Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning.

[BibT_eX]

[DOI]

Jing Bi

Jiebo Luo

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A Simple Baseline for Weakly-Supervised Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks?

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation.

[BibT_eX]

[DOI]

Di Hu

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

High-Fidelity Face Tracking for AR/VR via Deep Lighting Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Learning by Planning: Language-Guided Global Image Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Space-Time Memory Network for Sounding Object Localization in Videos.

[BibT_eX]

[DOI]

Sizhe Li

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Noise-Resilient Training Method for Face Landmark Generation From Speech.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

A Weakly Supervised Multi-task Ranking Framework for Actor-Action Semantic Segmentation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences.

[BibT_eX]

[DOI]

CoRR, 2020

Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report.

[BibT_eX]

[DOI]

CoRR, 2020

Graph Neural Network Based Coarse-Grained Mapping Prediction.

[BibT_eX]

[DOI]

Zhiheng Li

Geemi P. Wellawatte

Maghesree Chakraborty

Heta A. Gandhi

Andrew D. White

CoRR, 2020

What comprises a good talking-head video generation?: A Survey and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2020

Assembling Semantically-Disentangled Representations for Predictive-Generative Models via Adaptation from Synthetic Domain.

[BibT_eX]

[DOI]

Burkay Donderici

Caleb New

CoRR, 2020

TailorGAN: Making User-Defined Fashion Designs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

End-To-End Generation of Talking Faces from Noisy Speech.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing.

[BibT_eX]

[DOI]

Dingzeyu Li

Proceedings of the Computer Vision - ECCV 2020, 2020

Talking-Head Generation with Rhythmic Head Motion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Deep Grouping Model for Unified Perceptual Parsing.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

A Benchmark and Baseline for Language-Driven Image Editing.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Learning from Interventions Using Hierarchical Policies for Safe Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Online Audio-Visual Source Association for Chamber Music Performances.

[BibT_eX]

[DOI]

Trans. Int. Soc. Music. Inf. Retr., 2019

Deep Audio Prior.

[BibT_eX]

[DOI]

Dingzeyu Li

CoRR, 2019

Weakly Supervised Object Localization with Inter-Intra Regulated CAMs.

[BibT_eX]

[DOI]

CoRR, 2019

Unsupervised Pose Flow Learning for Pose Guided Synthesis.

[BibT_eX]

[DOI]

CoRR, 2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss.

[BibT_eX]

[DOI]

CoRR, 2019

3D Human Avatar Digitization from a Single Image.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Virtual-Reality Continuum and its Applications in Industry, 2019

GAN-EM: GAN Based EM Learning Framework.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Single Image 3D Vehicle Pose Estimation for Augmented Reality.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

Audio-Visual Interpretable and Controllable Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Audio-Visual Event Localization in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results.

[BibT_eX]

[DOI]

Rudrabha Mukhopadhyay

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Sound to Visual: Hierarchical Cross-Modal Talking Face Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018

Dynamic Graph Modules for Modeling Higher-Order Interactions in Activity Recognition.

[BibT_eX]

[DOI]

CoRR, 2018

An Attempt towards Interpretable Audio-Visual Video Captioning.

[BibT_eX]

[DOI]

CoRR, 2018

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos.

[BibT_eX]

[DOI]

CoRR, 2018

Navigation by Imitation in a Pedestrian-Rich Environment.

[BibT_eX]

[DOI]

CoRR, 2018

Improving Text-Based Person Search by Spatial Matching and Adaptive Threshold.

[BibT_eX]

[DOI]

Tianlang Chen

Jiebo Luo

Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

MRI tumor segmentation with densely connected 3D CNN.

[BibT_eX]

[DOI]

Proceedings of the Medical Imaging 2018: Image Processing, 2018

Generating Talking Face Landmarks from Speech.

[BibT_eX]

[DOI]

Proceedings of the Latent Variable Analysis and Signal Separation, 2018

Audio-Visual Event Localization in Unconstrained Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Lip Movements Generation at a Glance.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Weakly-Supervised Action Segmentation With Iterative Soft Boundary Assignment.

[BibT_eX]

[DOI]

Li Ding

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Towards Automatic Learning of Procedures From Web Instructional Videos.

[BibT_eX]

[DOI]

Luowei Zhou

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Dancelets Mining for Video Recommendation Based on Dance Styles.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2017

ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos.

[BibT_eX]

[DOI]

Luowei Zhou

CoRR, 2017

Action Understanding with Multiple Classes of Actors.

[BibT_eX]

[DOI]

Caiming Xiong

CoRR, 2017

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation.

[BibT_eX]

[DOI]

Li Ding

CoRR, 2017

Watch What You Just Said: Image Captioning with Text-Conditional Attention.

[BibT_eX]

[DOI]

Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Deep Cross-Modal Audio-Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Weakly Supervised Actor-Action Segmentation via Robust Multi-task Ranking.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Scale-Adaptive Video Understanding.

[BibT_eX]

[DOI]

PhD thesis, 2016

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2016

Image Caption Generation with Text-Conditional Semantic Attention.

[BibT_eX]

[DOI]

CoRR, 2016

Actor-Action Semantic Segmentation with Grouping Process Models.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Can humans fly? Action understanding with multiple classes of actors.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2013

A Study of Actor and Action Semantic retention in Video Supervoxel Segmentation.

[BibT_eX]

[DOI]

Int. J. Semantic Comput., 2013

TRECVID 2013 GENIE: Multimedia Event Detection and Recounting.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Are Actor and Action Semantics Retained in Video Supervoxel Segmentation?

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Seventh International Conference on Semantic Computing, 2013

Flattening Supervoxel Hierarchies by the Uniform Entropy Slice.

[BibT_eX]

[DOI]

Spencer Whitt

Proceedings of the IEEE International Conference on Computer Vision, 2013

A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012

TRECVID 2012 GENIE: Multimedia Event Detection and Recounting.

[BibT_eX]

[DOI]

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Streaming Hierarchical Video Segmentation.

[BibT_eX]

[DOI]

Caiming Xiong

Proceedings of the Computer Vision - ECCV 2012, 2012

Evaluation of super-voxel methods for early video processing.

[BibT_eX]

[DOI]