Weidi Xie

Orcid: 0009-0002-8609-6826

According to our database1, Weidi Xie authored at least 148 papers between 2017 and 2025.

Collaborative distances:

Timeline

2017
2018
2019
2020
2021
2022
2023
2024
2025
0
5
10
15
20
25
30
35
40
19
20
4
5
6
2
4
21
19
19
6
7
5
5
3
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Towards evaluating and building versatile large language models for medicine.
npj Digit. Medicine, 2025

2024
OV-VIS: Open-Vocabulary Video Instance Segmentation.
Int. J. Comput. Vis., November, 2024

OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition.
Int. J. Comput. Vis., November, 2024

Sensorless volumetric reconstruction of fetal brain freehand ultrasound scans with deep implicit representation.
Medical Image Anal., 2024

PMC-LLaMA: toward building open-source language models for medicine.
J. Am. Medical Informatics Assoc., 2024

A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis.
CoRR, 2024

Can Modern LLMs Act as Agent Cores in Radiology Environments?
CoRR, 2024

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities.
CoRR, 2024

Towards Universal Soccer Video Understanding.
CoRR, 2024

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant.
CoRR, 2024

Unlocking Video-LLM via Agent-of-Thoughts Distillation.
CoRR, 2024

LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models.
CoRR, 2024

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos.
CoRR, 2024

Can Visual Foundation Models Achieve Long-term Point Tracking?
CoRR, 2024

AutoRG-Brain: Grounded Report Generation for Brain MRI.
CoRR, 2024

A Sanity Check for AI-generated Image Detection.
CoRR, 2024

Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation.
CoRR, 2024

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis.
CoRR, 2024

Knowledge-enhanced Visual-Language Pretraining for Computational Pathology.
CoRR, 2024

Towards Building Multilingual Language Model for Medicine.
CoRR, 2024

Annotation-free Audio-Visual Segmentation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

A General Protocol to Probe Large Vision Models for 3D Physical Understanding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Enhancing Cross-Institute Generalisation of GNNs in Histopathology Through Multiple Embedding Graph Augmentation (MEGA).
Proceedings of the Medical Image Understanding and Analysis - 28th Annual Conference, 2024

Synchformer: Efficient Synchronization From Sparse Cues.
Proceedings of the IEEE International Conference on Acoustics, 2024

RaTEScore: A Metric for Radiology Report Generation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

EchoSight: Advancing Visual-Language Models with Wiki Knowledge.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

MatchTime: Towards Automatic Soccer Game Commentary Generation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology.
Proceedings of the Computer Vision - ECCV 2024, 2024

Made to Order: Discovering Monotonic Temporal Changes via Self-supervised Video Ordering.
Proceedings of the Computer Vision - ECCV 2024, 2024

VISA: Reasoning Video Object Segmentation via Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Appearance-Based Refinement for Object-Centric Motion Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Multi-sentence Grounding for Long-Term Instructional Video.
Proceedings of the Computer Vision - ECCV 2024, 2024

Amodal Ground Truth and Completion in the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Retrieval-Augmented Egocentric Video Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AutoAD III: The Prequel - Back to the Pixels.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Grounded Question-Answering in Long Egocentric Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Moving Object Segmentation: All You Need is SAM (and Flow).
Proceedings of the Computer Vision - ACCV 2024, 2024

AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description.
Proceedings of the Computer Vision - ACCV 2024, 2024

2023
Self-Supervised Tumor Segmentation With Sim2Real Adaptation.
IEEE J. Biomed. Health Informatics, September, 2023

Aerial Monocular 3D Object Detection.
IEEE Robotics Autom. Lett., April, 2023

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts.
CoRR, 2023

Large-scale Long-tailed Disease Diagnosis on Radiology Images.
CoRR, 2023

A Strong Baseline for Temporal Video-Text Alignment.
CoRR, 2023

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis.
CoRR, 2023

What Does Stable Diffusion Know about the 3D Scene?
CoRR, 2023

A Large-scale Dataset for Audio-Language Representation Learning.
CoRR, 2023

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training.
CoRR, 2023

Diagnosing Human-object Interaction Detectors.
CoRR, 2023

Towards Generalist Foundation Model for Radiology.
CoRR, 2023

arXiVeri: Automatic table verification with GPT.
CoRR, 2023

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models.
CoRR, 2023

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering.
CoRR, 2023

PMC-LLaMA: Further Finetuning LLaMA on Medical Papers.
CoRR, 2023

Multi-modal Prompting for Low-Shot Temporal Action Localization.
CoRR, 2023

Knowledge-enhanced Pre-training for Auto-diagnosis of Chest Radiology Images.
CoRR, 2023

K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging.
CoRR, 2023

Guiding Text-to-Image Diffusion Model Towards Grounded Generation.
CoRR, 2023

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training.
CoRR, 2023

Self-supervised Object-Centric Learning for Videos.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Deep Facial Phenotyping with Mixup Augmentation.
Proceedings of the Medical Image Understanding and Analysis - 27th Annual Conference, 2023

PMC-CLIP: Contrastive Language-Image Pre-training Using Biomedical Documents.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Multi-Modal Classifiers for Open-Vocabulary Object Detection.
Proceedings of the International Conference on Machine Learning, 2023

Joint-Relation Transformer for Multi-Person Motion Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Open-Vocabulary Video Instance Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Open-vocabulary Object Segmentation with Diffusion Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

The Making and Breaking of Camouflage.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Cali-NCE: Boosting Cross-modal Video Representation Learning with Calibrated Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NamedMask: Distilling Segmenters from Complementary Foundation Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Zero-shot Unsupervised Transfer Instance Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Collaboration Helps Camera Overtake LiDAR in 3D Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

AutoAD: Movie Description in Context.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

OvarNet: Towards Open-Vocabulary Object Attribute Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Boost Video Frame Interpolation via Motion Adaptation.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

Zero-shot Composed Text-Image Retrieval.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022
Subcortical segmentation of the fetal brain in 3D ultrasound using deep learning.
NeuroImage, 2022

Motion-inductive Self-supervised Object Discovery in Videos.
CoRR, 2022

K-Space Transformer for Fast MRI Reconstruction with Implicit Representation.
CoRR, 2022

PromptDet: Expand Your Detector Vocabulary with Uncurated Images.
CoRR, 2022

Segmenting Moving Objects via an Object-Centric Layered Representation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ReCo: Retrieve and Co-segment for Zero-shot Transfer.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Associating Objects and Their Effects in Video through Coordination Games.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Adaptive 3D Localization of 2D Freehand Ultrasound Brain Images.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, 2022

Transforming the Interactive Segmentation for Medical Imaging.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, 2022

Prompting Visual-Language Models for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2022, 2022

PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images.
Proceedings of the Computer Vision - ECCV 2022, 2022

It's About Time: Analog Clock Reading in the Wild.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unsupervised Salient Object Detection with Spectral Cluster Voting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Label, Verify, Correct: A Simple Few Shot Object Detection Method.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Temporal Alignment Networks for Long-term Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A Simple Plugin for Transforming Images to Arbitrary Scales.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

K-Space Transformer for Undersampled MRI Reconstruction.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

A Tri-Layer Plugin to Improve Occluded Detection.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

CounTR: Transformer-based Generalised Visual Counting.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

Turbo Training with Token Dropout.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Learning to map 2D ultrasound images into 3D space with minimal human annotation.
Medical Image Anal., 2021

ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation.
CoRR, 2021

Self-supervised Tumor Segmentation through Layer Decomposition.
CoRR, 2021

Quantum Self-Supervised Learning.
CoRR, 2021

NeRF-: Neural Radiance Fields Without Known Camera Parameters.
CoRR, 2021

Sli2Vol: Annotate a 3D Volume from a Single Slice with Self-supervised Learning.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

All you need are a few pixels: semantic segmentation with PixelPick.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Self-supervised Video Object Segmentation by Motion Grouping.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Localizing Visual Sounds the Hard Way.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Segmenting Invisible Moving Objects.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Audio-Visual Synchronisation in the wild.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Layered neural rendering for retiming people in video.
ACM Trans. Graph., 2020

Low-Memory CNNs Enabling Real-Time Ultrasound Segmentation Towards Mobile Deployment.
IEEE J. Biomed. Health Informatics, 2020

Voxceleb: Large-scale speaker verification in the wild.
Comput. Speech Lang., 2020

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge.
CoRR, 2020

Inducing Predictive Uncertainty Estimation for Face Recognition.
CoRR, 2020

Self-supervised Video Object Segmentation.
CoRR, 2020

Self-supervised Co-Training for Video Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Vggsound: A Large-Scale Audio-Visual Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Memory-Augmented Dense Predictive Coding for Video Representation Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval.
Proceedings of the Computer Vision - ECCV 2020, 2020

MAST: A Memory-Augmented Self-Supervised Tracker.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Inducing Predictive Uncertainty Estimation for Face Verification.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

2019
VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge.
CoRR, 2019

Self-supervised Learning for Video Correspondence Flow.
CoRR, 2019

Video Representation Learning by Dense Predictive Coding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Utterance-level Aggregation for Speaker Recognition in the Wild.
Proceedings of the IEEE International Conference on Acoustics, 2019

Geometry-Aware Video Object Detection for Static Cameras.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

Self-supervised Video Representation Learning for Correspondence Flow.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018
Ω-Net (Omega-Net): Fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks.
Medical Image Anal., 2018

Fully-automated alignment of 3D fetal brain ultrasound to a canonical reference space using multi-task learning.
Medical Image Anal., 2018

VP-Nets : Efficient automatic localization of key brain structures in 3D fetal neurosonography.
Medical Image Anal., 2018

Microscopy cell counting and detection with fully convolutional regression networks.
Comput. methods Biomech. Biomed. Eng. Imaging Vis., 2018

Can Dilated Convolutions Capture Ultrasound Video Dynamics?
Proceedings of the Machine Learning in Medical Imaging - 9th International Workshop, 2018

VGGFace2: A Dataset for Recognising Faces across Pose and Age.
Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018

Comparator Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Multicolumn Networks for Face Recognition.
Proceedings of the British Machine Vision Conference 2018, 2018

Class-Agnostic Counting.
Proceedings of the Computer Vision - ACCV 2018, 2018

2017
Deep neural networks in computer vision and biomedical image analysis.
PhD thesis, 2017

Omega-Net: Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks.
CoRR, 2017

Robust Regression of Brain Maturation from 3D Fetal Neurosonography Using CRNs.
Proceedings of the Fetal, Infant and Ophthalmic Medical Image Analysis, 2017

Freehand Ultrasound Image Simulation with Spatially-Conditioned Generative Adversarial Networks.
Proceedings of the Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment, 2017

Feature Tracking Cardiac Magnetic Resonance via Deep Learning and Spline Optimization.
Proceedings of the Functional Imaging and Modelling of the Heart, 2017


  Loading...