Chenliang Xu

Orcid: 0000-0002-2183-822X

According to our database1, Chenliang Xu authored at least 132 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA.
IEEE Trans. Multim., 2024

Scaling Concept With Text-Guided Diffusion Models.
CoRR, 2024

Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?
CoRR, 2024

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.
CoRR, 2024

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation.
CoRR, 2024

TextToon: Real-Time Text Toonify Head Avatar from Single Video.
CoRR, 2024

Quadratic Is Not What You Need For Multimodal Large Language Models.
CoRR, 2024

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion.
CoRR, 2024

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts.
CoRR, 2024

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
CoRR, 2024

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning.
CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.
CoRR, 2024

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training.
CoRR, 2024

Efficiently Leveraging Linguistic Priors for Scene Text Spotting.
CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.
CoRR, 2024

Tri<sup>2</sup>-plane: Volumetric Avatar Reconstruction with Feature Pyramid.
CoRR, 2024

Bag of Tricks to Boost Adversarial Transferability.
CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EAGLE: Egocentric AGgregated Language-video Engine.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

One Forward is Enough for Neural Network Training via Likelihood Ratio Method.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Learning Audio Concepts from Counterfactual Natural Language.
Proceedings of the IEEE International Conference on Acoustics, 2024

Adaptive Super Resolution for One-Shot Talking-Head Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Tri<sup>2</sup>-plane: Thinking Head Avatar via Feature Pyramid.
Proceedings of the Computer Vision - ECCV 2024, 2024

Modeling and Driving Human Body Soundfields Through Acoustic Primitives.
Proceedings of the Computer Vision - ECCV 2024, 2024

Random Smooth-based Certified Defense against Text Adversarial Attack.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Learning to Transform Dynamically for Better Adversarial Transferability.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Rapid runtime learning by curating small datasets of high-quality items obtained from memory.
PLoS Comput. Biol., October, 2023

Video Understanding with Large Language Models: A Survey.
CoRR, 2023

Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores.
CoRR, 2023

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation.
CoRR, 2023

MISAR: A Multimodal Instructional System with Augmented Reality.
CoRR, 2023

Emotional Listener Portrait: Neural Listener Head Generation with Emotion.
CoRR, 2023

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields.
CoRR, 2023

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models.
CoRR, 2023

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA.
CoRR, 2023

Training Neural Networks without Backpropagation: A Deeper Dive into the Likelihood Ratio Method.
CoRR, 2023

Improving Adversarial Transferability with Scheduled Step Size and Dual Example.
CoRR, 2023

PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data.
Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Egocentric Audio-Visual Object Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Face Forgery Detection via Symmetric Transformer.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Cross-modal Contrastive Distillation for Instructional Activity Anticipation.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

Discover and Mitigate Unknown Biases with Debiasing Alternate Networks.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning to Answer Questions in Dynamic Audio-Visual Scenarios.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Pose Flow Learning From Person Images for Pose Guided Synthesis.
IEEE Trans. Image Process., 2021

Structured and Consistent Multi-Layer Multi-Kernel Subtask Correction Filter Tracker.
IEEE Trans. Circuits Syst. Video Technol., 2021

Anomaly Crossing: A New Method for Video Anomaly Detection as Cross-domain Few-shot Learning.
CoRR, 2021

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing.
CoRR, 2021

Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution.
CoRR, 2021

Animated 3D human avatars from a single image with GAN-based texture inference.
Comput. Graph., 2021

How to Make a BLT Sandwich? Learning VQA towards Understanding Web Instructional Videos.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Improve CAM with Auto-adapted Segmentation and Co-supervised Augmentation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Learning to Generate Scene Graph from Natural Language Supervision.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Discover the Unknown Biased Attribute of an Image Classifier.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Explaining Local, Global, And Higher-Order Interactions In Deep Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A Simple Baseline for Weakly-Supervised Scene Graph Generation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks?
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

High-Fidelity Face Tracking for AR/VR via Deep Lighting Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Learning by Planning: Language-Guided Global Image Editing.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Space-Time Memory Network for Sounding Object Localization in Videos.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Noise-Resilient Training Method for Face Landmark Generation From Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

A Weakly Supervised Multi-task Ranking Framework for Actor-Action Semantic Segmentation.
Int. J. Comput. Vis., 2020

Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences.
CoRR, 2020

Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report.
CoRR, 2020

Graph Neural Network Based Coarse-Grained Mapping Prediction.
CoRR, 2020

What comprises a good talking-head video generation?: A Survey and Benchmark.
CoRR, 2020

Assembling Semantically-Disentangled Representations for Predictive-Generative Models via Adaptation from Synthetic Domain.
CoRR, 2020

TailorGAN: Making User-Defined Fashion Designs.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

End-To-End Generation of Talking Faces from Noisy Speech.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing.
Proceedings of the Computer Vision - ECCV 2020, 2020

Talking-Head Generation with Rhythmic Head Motion.
Proceedings of the Computer Vision - ECCV 2020, 2020

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Deep Grouping Model for Unified Perceptual Parsing.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

A Benchmark and Baseline for Language-Driven Image Editing.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Learning from Interventions Using Hierarchical Policies for Safe Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Online Audio-Visual Source Association for Chamber Music Performances.
Trans. Int. Soc. Music. Inf. Retr., 2019

Deep Audio Prior.
CoRR, 2019

Weakly Supervised Object Localization with Inter-Intra Regulated CAMs.
CoRR, 2019

Unsupervised Pose Flow Learning for Pose Guided Synthesis.
CoRR, 2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss.
CoRR, 2019

3D Human Avatar Digitization from a Single Image.
Proceedings of the 17th International Conference on Virtual-Reality Continuum and its Applications in Industry, 2019

GAN-EM: GAN Based EM Learning Framework.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Single Image 3D Vehicle Pose Estimation for Augmented Reality.
Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

Audio-Visual Interpretable and Controllable Video Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Audio-Visual Event Localization in the Wild.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019


Sound to Visual: Hierarchical Cross-Modal Talking Face Generation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018
Dynamic Graph Modules for Modeling Higher-Order Interactions in Activity Recognition.
CoRR, 2018

An Attempt towards Interpretable Audio-Visual Video Captioning.
CoRR, 2018

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos.
CoRR, 2018

Navigation by Imitation in a Pedestrian-Rich Environment.
CoRR, 2018

Improving Text-Based Person Search by Spatial Matching and Adaptive Threshold.
Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

MRI tumor segmentation with densely connected 3D CNN.
Proceedings of the Medical Imaging 2018: Image Processing, 2018

Generating Talking Face Landmarks from Speech.
Proceedings of the Latent Variable Analysis and Signal Separation, 2018

Audio-Visual Event Localization in Unconstrained Videos.
Proceedings of the Computer Vision - ECCV 2018, 2018

Lip Movements Generation at a Glance.
Proceedings of the Computer Vision - ECCV 2018, 2018

Weakly-Supervised Action Segmentation With Iterative Soft Boundary Assignment.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Towards Automatic Learning of Procedures From Web Instructional Videos.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Dancelets Mining for Video Recommendation Based on Dance Styles.
IEEE Trans. Multim., 2017

ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos.
CoRR, 2017

Action Understanding with Multiple Classes of Actors.
CoRR, 2017

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation.
CoRR, 2017

Watch What You Just Said: Image Captioning with Text-Conditional Attention.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Deep Cross-Modal Audio-Visual Generation.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Weakly Supervised Actor-Action Segmentation via Robust Multi-task Ranking.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Scale-Adaptive Video Understanding.
PhD thesis, 2016

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing.
Int. J. Comput. Vis., 2016

Image Caption Generation with Text-Conditional Semantic Attention.
CoRR, 2016

Actor-Action Semantic Segmentation with Grouping Process Models.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Can humans fly? Action understanding with multiple classes of actors.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2013
A Study of Actor and Action Semantic retention in Video Supervoxel Segmentation.
Int. J. Semantic Comput., 2013


Are Actor and Action Semantics Retained in Video Supervoxel Segmentation?
Proceedings of the 2013 IEEE Seventh International Conference on Semantic Computing, 2013

Flattening Supervoxel Hierarchies by the Uniform Entropy Slice.
Proceedings of the IEEE International Conference on Computer Vision, 2013

A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012

Streaming Hierarchical Video Segmentation.
Proceedings of the Computer Vision - ECCV 2012, 2012

Evaluation of super-voxel methods for early video processing.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012


  Loading...