Yapeng Tian

CoRR, 2024

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach.

[BibT_eX]

[DOI]

CoRR, 2024

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level.

[BibT_eX]

[DOI]

CoRR, 2024

Scaling Concept With Text-Guided Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP.

[BibT_eX]

[DOI]

CoRR, 2024

Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models.

[BibT_eX]

[DOI]

Saksham Singh Kushwaha

CoRR, 2024

Semantic Grouping Network for Audio Source Separation.

[BibT_eX]

[DOI]

CoRR, 2024

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation.

[BibT_eX]

[DOI]

Siva Sai Nagender Vasireddy

CoRR, 2024

SignLLM: Sign Languages Production Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Robust Active Speaker Detection in Noisy Environments.

[BibT_eX]

[DOI]

Chenxu Zhang

Xiaohu Guo

CoRR, 2024

Text-to-Audio Generation Synchronized with Videos.

[BibT_eX]

[DOI]

Jing Shi

CoRR, 2024

Efficiently Leveraging Linguistic Priors for Scene Text Spotting.

[BibT_eX]

[DOI]

Nguyen Nguyen

CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

CoRR, 2024

LAVSS: Location-Guided Audio-Visual Spatial Audio Separation.

[BibT_eX]

[DOI]

Yuxin Ye

Wenming Yang

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision.

[BibT_eX]

[DOI]

Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, 2024

Continual Audio-Visual Sound Separation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Towards AI-Powered AR for Enhancing Sports Playability for People with Low Vision: An Exploration of ARSports.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct, 2024

SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation.

[BibT_eX]

[DOI]

Kai Wang

Dimitrios Hatzinakos

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures.

[BibT_eX]

[DOI]

Tanvir Mahmud

Diana Marculescu

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers.

[BibT_eX]

[DOI]

Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024

MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference on Creativity & Cognition, 2024

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

High-Quality Visually-Guided Sound Separation from Diverse Categories.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

Adaptive channel-modulated personalized federated learning for magnetic resonance image reconstruction.

[BibT_eX]

[DOI]

Comput. Biol. Medicine, October, 2023

Meta-Learning-Based Degradation Representation for Blind Super-Resolution.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

GDSSR: Toward Real-World Ultra-High-Resolution Image Super-Resolution.

[BibT_eX]

[DOI]

Yichen Chi

Wenming Yang

IEEE Signal Process. Lett., 2023

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation.

[BibT_eX]

[DOI]

CoRR, 2023

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields.

[BibT_eX]

[DOI]

CoRR, 2023

CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction.

[BibT_eX]

[DOI]

CoRR, 2023

SignDiff: Learning Diffusion Models for American Sign Language Production.

[BibT_eX]

[DOI]

CoRR, 2023

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Long Form Audio-visual Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA.

[BibT_eX]

[DOI]

CoRR, 2023

EgoVSR: Towards High-Quality Egocentric Video Super-Resolution.

[BibT_eX]

[DOI]

CoRR, 2023

DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment.

[BibT_eX]

[DOI]

Jing Shi

CoRR, 2023

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Dual Arbitrary Scale Super-Resolution for Multi-contrast MRI.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Knowledge Distillation based Degradation Estimation for Blind Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Basic Binary Convolution Unit for Binarized Image Restoration Network.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

DiffIR: Efficient Diffusion Model for Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Audio-Visual Class-Incremental Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Class-Incremental Grouping Network for Continual Audio-Visual Learning.

[BibT_eX]

[DOI]

Weiguo Pian

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Structured Sparsity Learning for Efficient Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Audio-Visual Grouping Network for Sound Localization from Mixtures.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Egocentric Audio-Visual Object Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Unified, Explainable, and Robust Multisensory Perception.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Learning in Audio-visual Context: A Review, Analysis, and New Perspective.

[BibT_eX]

[DOI]

CoRR, 2022

Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DuDoCAF: Dual-Domain Cross-Attention Fusion with Recurrent Transformer for Fast Multi-contrast MR Imaging.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, 2022

Correspondences for image and video reconstruction.

[BibT_eX]

[DOI]

Xiaoyu Xiang

Proceedings of the Imaging and Multimedia Analytics at the Edge 2022, 2022

Learning Spatio-Temporal Downsampling for Effective Video Upscaling.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Learning to Answer Questions in Dynamic Audio-Visual Scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation for Reference-Based Super-resolution.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Efficient Non-local Contrastive Attention for Image Super-resolution.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Residual Dense Network for Image Restoration.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution.

[BibT_eX]

[DOI]

CoRR, 2021

Video Matting via Consistency-Regularized Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks?

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation.

[BibT_eX]

[DOI]

Di Hu

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Space-Time Memory Network for Sounding Object Localization in Videos.

[BibT_eX]

[DOI]

Sizhe Li

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

LCSCNet: Linear Compressing-Based Skip-Connecting Network for Image Super-Resolution.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing.

[BibT_eX]

[DOI]

Dingzeyu Li

Proceedings of the Computer Vision - ECCV 2020, 2020

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Deep Learning for Single Image Super-Resolution: A Brief Review.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2019

Deep Audio Prior.

[BibT_eX]

[DOI]