Lei Zhang

Orcid: 0000-0001-6926-0538

Affiliations:
  • International Digital Economy Academy (IDEA), Shenzhen, China
  • Microsoft Research, Redmond, WA, USA
  • Microsoft Research Asia, Beijing, China (former)
  • Tsinghua University, Department of Computer Science, Beijing, China (PhD 2001)


According to our database1, Lei Zhang authored at least 318 papers between 2001 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A small object detection network for remote sensing based on CS-PANet and DSAN.
Multim. Tools Appl., August, 2024

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

TLDW: Extreme Multimodal Summarization of News Videos.
IEEE Trans. Circuits Syst. Video Technol., March, 2024

Survey of Code Search Based on Deep Learning.
ACM Trans. Softw. Eng. Methodol., February, 2024

UniG: Modelling Unitary 3D Gaussians for View-consistent 3D Reconstruction.
CoRR, 2024

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion.
CoRR, 2024

SymPoint Revolutionized: Boosting Panoptic Symbol Spotting with Layer Feature Enhancement.
CoRR, 2024

Toward Exploring the Code Understanding Capabilities of Pre-trained Code Generation Models.
CoRR, 2024

Open-World Human-Object Interaction Detection via Multi-modal Prompts.
CoRR, 2024

MotionLLM: Understanding Human Behaviors from Human Motions and Videos.
CoRR, 2024

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection.
CoRR, 2024

DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction.
CoRR, 2024

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility.
CoRR, 2024

Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription.
CoRR, 2024

Scaling Laws Behind Code Understanding Model.
CoRR, 2024

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs.
CoRR, 2024

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks.
CoRR, 2024

Symbol as Points: Panoptic Symbol Spotting via Point-based Representation.
CoRR, 2024

HumanTOMATO: Text-aligned Whole-body Motion Generation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

TOSS: High-quality Text-guided Novel View Synthesis from a Single Image.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Tag2Text: Guiding Vision-Language Model via Image Tagging.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Compress3D: A Compressed Latent Space for 3D Generation from a Single Image.
Proceedings of the Computer Vision - ECCV 2024, 2024

Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents.
Proceedings of the Computer Vision - ECCV 2024, 2024

Segment and Recognize Anything at Any Granularity.
Proceedings of the Computer Vision - ECCV 2024, 2024

TAPTR: Tracking Any Point with Transformers as Detection.
Proceedings of the Computer Vision - ECCV 2024, 2024

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy.
Proceedings of the Computer Vision - ECCV 2024, 2024

Visual in-Context Prompting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Stable Score Distillation for High-Quality 3D Generation.
CoRR, 2023

RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning.
CoRR, 2023

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction.
CoRR, 2023

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.
CoRR, 2023

T-Rex: Counting by Visual Prompting.
CoRR, 2023

AcademicGPT: Empowering Academic Research.
CoRR, 2023

Inject Semantic Concepts into Image Tagging for Open-Set Recognition.
CoRR, 2023

UniPose: Detecting Any Keypoints.
CoRR, 2023

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation.
CoRR, 2023

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices.
CoRR, 2023

Semantic-SAM: Segment and Recognize Anything at Any Granularity.
CoRR, 2023

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation.
CoRR, 2023

Understanding Optimization of Deep Learning.
CoRR, 2023

detrex: Benchmarking Detection Transformers.
CoRR, 2023

Recognize Anything: A Strong Image Tagging Model.
CoRR, 2023

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model.
CoRR, 2023

A Survey of Deep Code Search.
CoRR, 2023

A Strong and Reproducible Object Detector with Only Public Datasets.
CoRR, 2023

A Simple Framework for Open-Vocabulary Segmentation and Detection.
CoRR, 2023

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
CoRR, 2023

DA-BEV: Depth Aware BEV Transformer for 3D Object Detection.
CoRR, 2023

Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning.
Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing, 2023

A Comprehensive Benchmark for Neural Human Radiance Fields.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

TopicCAT: Unsupervised Topic-Guided Co-Attention Transformer for Extreme Multimodal Summarisation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

LipsFormer: Introducing Lipschitz Continuity to Vision Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

A Simple Framework for Open-Vocabulary Segmentation and Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Neural Interactive Keypoint Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Detection Transformer with Stable Matching.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MP-Former: Mask-Piloted Transformer for Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Are Transformers Effective for Time Series Forecasting?
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Towards generalizable detection of face forgery via self-guided model-agnostic learning.
Pattern Recognit. Lett., 2022

Online multi-object tracking with unsupervised re-identification learning and occlusion estimation.
Neurocomputing, 2022

Exploring Vision Transformers as Diffusion Learners.
CoRR, 2022

A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation.
CoRR, 2022

TLDW: Extreme Multimodal Summarisation of News Videos.
CoRR, 2022

Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision.
CoRR, 2022

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models.
CoRR, 2022

3-mode Real-time MDM Transmission Using Single-mode OTN Transceivers over 300 km Weakly-coupled FMF.
Proceedings of the Optical Fiber Communications Conference and Exhibition, 2022

OTExtSum: Extractive Text Summarisation with Optimal Transport.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Axiomatic Explanations for Visual Search, Retrieval, and Similarity Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Neural Architecture Search with Representation Mutual Information.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Grounded Language-Image Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Large-Scale Pre-training for Person Re-identification with Noisy Labels.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
A Comparison of Wintertime Atmospheric Boundary Layer Heights Determined by Tethered Balloon Soundings and Lidar at the Site of SACOL.
Remote. Sens., 2021

Vision-Language Navigation Policy Learning and Adaptation.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Unsupervised Finetuning.
CoRR, 2021

Image Scene Graph Generation (SGG) Benchmark.
CoRR, 2021

Query2Label: A Simple Transformer Way to Multi-Label Classification.
CoRR, 2021

CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning.
CoRR, 2021

Model-Agnostic Explainability for Visual Search.
CoRR, 2021

VinVL: Making Visual Representations Matter in Vision-Language Models.
CoRR, 2021

PERMS: An efficient rescue route planning system in disasters.
Appl. Soft Comput., 2021

Chasing Sparsity in Vision Transformers: An End-to-End Exploration.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SEED: Self-supervised Distillation For Visual Representation.
Proceedings of the 9th International Conference on Learning Representations, 2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CvT: Introducing Convolutions to Vision Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

MicroNet: Improving Image Recognition with Extremely Low FLOPs.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Improve Unsupervised Pretraining for Few-label Transfer.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Dynamic DETR: End-to-End Object Detection with Dynamic Attention.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

DAP: Detection-Aware Pre-Training With Weak Supervision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

VinVL: Revisiting Visual Representations in Vision-Language Models.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Lite-HRNet: A Lightweight High-Resolution Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Unsupervised Part Segmentation Through Disentangling Appearance and Shape.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Unsupervised Pre-Training for Person Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dynamic Head: Unifying Object Detection Heads With Attentions.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Real-Time Burst Photo Selection Using a Light-Head Adversarial Network.
IEEE Trans. Image Process., 2020

Self-supervised Pre-training with Hard Examples Improves Visual Representations.
CoRR, 2020

MiniVLM: A Smaller and Faster Vision-Language Model.
CoRR, 2020

Are Fewer Labels Possible for Few-shot Learning?
CoRR, 2020

MicroNet: Towards Image Recognition with Extremely Low FLOPs.
CoRR, 2020

VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training.
CoRR, 2020

Hashing-based Non-Maximum Suppression for Crowded Object Detection.
CoRR, 2020

Novel Human-Object Interaction Detection via Adversarial Domain Generalization.
CoRR, 2020

Anchor Box Optimization for Object Detection.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval.
Proceedings of the NeurIPS 2020 Competition and Demonstration Track, 2020

Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer.
Proceedings of the Computer Vision - ECCV 2020, 2020

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks.
Proceedings of the Computer Vision - ECCV 2020, 2020

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Large-Scale Intelligent Microservices.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

Unified Vision-Language Pre-Training for Image Captioning and VQA.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Automatic visual pattern mining from categorical image dataset.
Int. J. Multim. Inf. Retr., 2019

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection.
CoRR, 2019

Bottom-up Higher-Resolution Networks for Multi-Person Pose Estimation.
CoRR, 2019

Learning to Count Objects with Few Exemplar Annotations.
CoRR, 2019

Improving 3D Human Pose Estimation Via 3D Part Affinity Fields.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

TIGEr: Text-to-Image Grounding for Image Caption Evaluation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Object-Driven Text-To-Image Synthesis via Adversarial Training.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

High Frequency Residual Learning for Multi-Scale Image Classification.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018
Relevance Feedback for Content-Based Information Retrieval.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Annotation-Based Image Retrieval.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Visual instance mining from the graph perspective.
Multim. Syst., 2018

CLOTHO: A Large-Scale Internet of Things-Based Crowd Evacuation Planning System for Disaster Management.
IEEE Internet Things J., 2018

Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition.
CoRR, 2018

AutoLoc: Weakly-supervised Temporal Action Localization.
CoRR, 2018

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning.
CoRR, 2018

Turbo Learning for CaptionBot and DrawingBot.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

PatternNet: Visual Pattern Mining with Deep Neural Network.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos.
Proceedings of the Computer Vision - ECCV 2018, 2018

CleanNet: Transfer Learning for Scalable Image Classifier Training With Label Noise.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Bottom-Up and Top-Down Attention for Image Captioning and VQA.
CoRR, 2017

2016
Rich Image Captioning in the Wild.
CoRR, 2016

ICME 2016 Image Recognition Grand Challenge.
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016

MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World.
Proceedings of the Imaging and Multimedia Analytics in a Web and Mobile World 2016, 2016

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition.
Proceedings of the Computer Vision - ECCV 2016, 2016

Rich Image Captioning in the Wild.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016

2015
Partial-Duplicate Clustering and Visual Pattern Discovery on Web Scale Image Database.
IEEE Trans. Multim., 2015

Near Duplicate Image Discovery on One Billion Images.
Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 2015

IdeaPanel: A Large Scale Interactive Sketch-based Image Search System.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Sketch-based Image Retrieval via Shape Words.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Scalable Visual Instance Mining with Instance Graph.
Proceedings of the British Machine Vision Conference 2015, 2015

2014
Introduction to the Special Issue Best Papers of ACM Multimedia 2013.
ACM Trans. Multim. Comput. Commun. Appl., 2014

Viewpoint-Aware Representation for Sketch-Based 3D Model Retrieval.
IEEE Signal Process. Lett., 2014

A novel clustered MongoDB-based storage system for unstructured data with high availability.
Computing, 2014

Mining text snippets for images on the web.
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014

2013
Image search - from thousands to billions in 20 years.
ACM Trans. Multim. Comput. Commun. Appl., 2013

Special issue on image feature detection and description.
Neurocomputing, 2013

Nested-SIFT for Efficient Image Matching and Retrieval.
IEEE Multim., 2013

Regularized Discriminant Embedding for Visual Descriptor Learning
Proceedings of the 1st International Conference on Learning Representations, 2013

Indexing billions of images for sketch-based retrieval.
Proceedings of the ACM Multimedia Conference, 2013

MagicBrush: image search by color sketch.
Proceedings of the ACM Multimedia Conference, 2013

Mobile multimedia travelogue generation by exploring geo-locations and image tags.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

A highly efficient external memory interface architecture for AVS HD video encoder.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, 2013

The shortest warping path based multiple images alignment.
Proceedings of the IEEE International Conference on Image Processing, 2013

Duplicate Discovery on 2 Billion Internet Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013

Exploring Implicit Image Statistics for Visual Representativeness Modeling.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
Finding Celebrities in Billions of Web Images.
IEEE Trans. Multim., 2012

Duplicate-Search-Based Image Annotation Using Web-Scale Data.
Proc. IEEE, 2012

Preface.
J. Comput. Sci. Technol., 2012

A probabilistic graphical model for topic and preference discovery on social media.
Neurocomputing, 2012

Analogical Reasoning for Answer Ranking in Social Question Answering.
IEEE Intell. Syst., 2012

Towards indexing representative images on the web.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Sketch2Tag: automatic hand-drawn sketch recognition.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Query-adaptive shape topic mining for hand-drawn sketch recognition.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

A rapid flower/leaf recognition system.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Trip Mining and Recommendation from Geo-tagged Photos.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops, 2012

Efficient Tag Mining via Mixture Modeling for Real-Time Search-Based Image Annotation.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

MyStore: A High Available Distributed Storage System for Unstructured Data.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Free Hand-Drawn Sketch Segmentation.
Proceedings of the Computer Vision - ECCV 2012, 2012

Pairwise Rotation Invariant Co-occurrence Local Binary Pattern.
Proceedings of the Computer Vision - ECCV 2012, 2012

QsRank: Query-sensitive hash code ranking for efficient ∊-neighbor search.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

PartBook for image parsing.
Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012

The scale of edges.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

3D visual phrases for landmark recognition.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

Hierarchical Object Representations for Visual Recognition via Weakly Supervised Learning.
Proceedings of the Computer Vision - ACCV 2012, 2012

2011
Summarizing tourist destinations by mining user-generated travelogues and photos.
Comput. Vis. Image Underst., 2011

Query by document via a decomposition-based two-level retrieval approach.
Proceedings of the Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011

From one tree to a forest: a unified solution for structured web data extraction.
Proceedings of the Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011

Multi-feature pLSA for combining visual features in image annotation.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Semantic point detector.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Sketch2Cartoon: composing cartoon images by sketching.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Contextual synonym dictionary for visual object retrieval.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Grassmann Hashing for approximate nearest neighbor search in high dimensional space.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

Rank-SIFT: Learning to rank repeatable local interest points.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

Edgel index for large-scale sketch-based image search.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

User browsing behavior-driven web crawling.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

2010
Constructing Concept Lexica With Small Semantic Gaps.
IEEE Trans. Multim., 2010

MindFinder: image search by interactive sketching and tagging.
Proceedings of the 19th International Conference on World Wide Web, 2010

Diversifying landmark image search results by learning interested views from community photos.
Proceedings of the 19th International Conference on World Wide Web, 2010

A pattern tree-based approach to learning URL normalization rules.
Proceedings of the 19th International Conference on World Wide Web, 2010

Equip tourists with knowledge mined from travelogues.
Proceedings of the 19th International Conference on World Wide Web, 2010

Mining adjacent markets from a large-scale ads video collection for image advertising.
Proceedings of the Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010

Photo2Trip: an interactive trip planning system based on geo-tagged photos.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Understanding multimedia content using web scale social media data.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Photo2Trip: generating travel routes from geo-tagged photos for trip planning.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

MindFinder: interactive sketch-based image search on millions of images.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Robust semantic sketch based specific image retrieval.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

An efficient location extraction algorithm by leveraging web contextual information.
Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2010

Max-Margin Dictionary Learning for Multiclass Image Categorization.
Proceedings of the Computer Vision, 2010

Interest seam image.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

ARISTA - image search to annotation on billions of web photos.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

Probabilistic models for supervised dictionary learning.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

Spatial-bag-of-features.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Relevance Feedback for Content-Based Information Retrieval.
Proceedings of the Encyclopedia of Database Systems, 2009

Annotation-based Image Retrieval.
Proceedings of the Encyclopedia of Database Systems, 2009

FPGA Acceleration of RankBoost in Web Search Engines.
ACM Trans. Reconfigurable Technol. Syst., 2009

A Unified Relevance Feedback Framework for Web Image Retrieval.
IEEE Trans. Image Process., 2009

Non-Negative Semi-Supervised Learning.
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, 2009

Incorporating site-level knowledge to extract structured data from web forums.
Proceedings of the 18th International Conference on World Wide Web, 2009

Ranking community answers via analogical reasoning.
Proceedings of the 18th International Conference on World Wide Web, 2009

Modeling semantics and structure of discussion threads.
Proceedings of the 18th International Conference on World Wide Web, 2009

Ranking community answers by modeling question-answer relationships via analogical reasoning.
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009

Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications.
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009

LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression.
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009

Multimedia content analysis: model-based approaches vs. data-driven approaches.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Argo: intelligent advertising made possible from users' photos.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

TravelScope: standing on the shoulders of dedicated travelers.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Generating location overviews with images and tags by mining user-generated travelogues.
Proceedings of the 17th International Conference on Multimedia 2009, 2009

Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy.
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28, 2009

Argo: intelligent advertising by mining a user's interest from his photo collections.
Proceedings of the 3rd ACM SIGKDD Workshop on Data Mining and Audience Intelligence for Advertising, 2009

User grouping behavior in online forums.
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28, 2009

Advertising based on users' photos.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

A lexica family with small semantic gap.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Efficient indexing for large scale visual search.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

The data deluge: Challenges and opportunities of unlimited data in statistical signal processing.
Proceedings of the IEEE International Conference on Acoustics, 2009

Visualizing textual travelogue with location-relevant images.
Proceedings of the 2009 International Workshop on Location Based Social Networks, 2009

Multi-label sparse coding for automatic image annotation.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Multiplicative nonnegative graph embedding.
Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009

Annotating Images by Mining Image Search.
Proceedings of the Semantic Mining Technologies for Multimedia Databases., 2009

2008
Reconstruction and Recognition of Tensor-Based Objects With Concurrent Subspaces Analysis.
IEEE Trans. Circuits Syst. Video Technol., 2008

Annotating Images by Mining Image Search Results.
IEEE Trans. Pattern Anal. Mach. Intell., 2008

Scalable search-based image annotation.
Multim. Syst., 2008

Improving relevance judgment of web search results with image excerpts.
Proceedings of the 17th International Conference on World Wide Web, 2008

iRobot: an intelligent crawler for web forums.
Proceedings of the 17th International Conference on World Wide Web, 2008

Learning to reduce the semantic gap in web image retrieval and annotation.
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

Exploring traversal strategy for web forum crawling.
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

Delivering online advertisements inside images.
Proceedings of the 16th International Conference on Multimedia 2008, 2008

Graph-based multiple-instance learning for object-based image retrieval.
Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008

What are the high-level concepts with small semantic gaps?
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

Scalable Markov model-based image annotation.
Proceedings of the 7th ACM International Conference on Image and Video Retrieval, 2008

Search-based query suggestion.
Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008

2007
Multilinear Discriminant Analysis for Face Recognition.
IEEE Trans. Image Process., 2007

SBIA: search-based image annotation by leveraging web-scale images.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Scalable music recommendation by search.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

MusicSense: contextual music recommendation using emotional allocation modeling.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Searching One Billion Web Images by Content: Challenges and Opportunities.
Proceedings of the Multimedia Content Analysis and Mining, International Workshop, 2007

Retrieving Web Images to Enrich Music Representation.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Search Result Clustering Based Relevance Feedback for Web Image Retrival.
Proceedings of the IEEE International Conference on Acoustics, 2007

Automated Music Video Generation using WEB Image Resource.
Proceedings of the IEEE International Conference on Acoustics, 2007

FPGA-based Accelerator Design for RankBoost in Web Search Engines.
Proceedings of the 2007 International Conference on Field-Programmable Technology, 2007

Content-Based Image Annotation Refinement.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Learning query-biased web page summarization.
Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, 2007

IGroup: presenting web image search results in semantic clusters.
Proceedings of the 2007 Conference on Human Factors in Computing Systems, 2007

2006
Human Gait Recognition With Matrix Representation.
IEEE Trans. Circuits Syst. Video Technol., 2006

Image annotation using search and mining technologies.
Proceedings of the 15th international conference on World Wide Web, 2006

EnjoyPhoto: a vertical image search engine for enjoying high-quality photos.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Image annotation refinement using random walk with restarts.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Image annotation by large-scale content-based image retrieval.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

VirtualTour: an online travel assistant based on high quality images.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

IGroup: a web image search engine with semantic clustering of search results.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

IGroup: web image search results clustering.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Scalable relevance feedback using click-through data for web image retrieval.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Scalable search-based image annotation of personal images.
Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2006

AnnoSearch: Image Auto-Annotation by Search.
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006

Ranking web objects from multiple communities.
Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, 2006

2005
Boosting image classification with LDA-based feature combination for digital photograph management.
Pattern Recognit., 2005

Efficient 3D reconstruction for face recognition.
Pattern Recognit., 2005

Salient Feature Selection for Visual Concept Learning.
Proceedings of the Advances in Multimedia Information Processing, 2005

Parallel Image Matrix Compression for Face Recognition.
Proceedings of the 11th International Conference on Multi Media Modeling (MMM 2005), 2005

Iteratively clustering web images based on link and attribute reinforcements.
Proceedings of the 13th ACM International Conference on Multimedia, 2005

Multi-graph enabled active learning for multimodal web image retrieval.
Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2005

Similarity space projection for web image search and annotation.
Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2005

Auto cropping for digital photographs.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Natural Image Retrieval with Sketches.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Neighborhood Preserving Projections (NPP): A Novel Linear Dimension Reduction Method.
Proceedings of the Advances in Intelligent Computing, 2005

Coupled Kernel-Based Subspace Learning.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

Discriminant Analysis with Tensor Representation.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

Concurrent Subspaces Analysis.
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005

2004
Dynamic Feature Space Selection in Relevance Feedback Using Support Vector Machines.
Proceedings of the Pattern Recognition in Information Systems, 2004

A Novel Gabor-LDA Based Face Recognition Method.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

Efficient propagation for face annotation in family albums.
Proceedings of the 12th ACM International Conference on Multimedia, 2004

Automated red-eye detection and correction in digital photographs.
Proceedings of the 2004 International Conference on Image Processing, 2004

Automatic 3D Reconstruction for Face Recognition.
Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2004), 2004

3D Shape Constraint for Facial Feature Localization Using Probabilistic-like Output.
Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2004), 2004

2003
Face Annotation for Family Photo Album Management.
Int. J. Image Graph., 2003

Automated annotation of human faces in family albums.
Proceedings of the Eleventh ACM International Conference on Multimedia, 2003

Semantic image clustering using relevance feedback.
Proceedings of the 2003 International Symposium on Circuits and Systems, 2003

An efficient memorization scheme for relevance feedback in image retrieval.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Learning in Region-Based Image Retrieval.
Proceedings of the Image and Video Retrieval, Second International Conference, 2003

Head Pose Estimation using Fisher Manifold Learning.
Proceedings of the 2003 IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG 2003), 2003

2002
Boosting Image Orientation Detection with Indoor vs. Outdoor Classification.
Proceedings of the 6th IEEE Workshop on Applications of Computer Vision (WACV 2002), 2002

<i>MyPhotos</i>: a system for home photo management and processing.
Proceedings of the 10th ACM International Conference on Multimedia 2002, 2002

Gaussian mixture model for relevance feedback in image retrieval.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Chinese Named Entity Identification Using Class-based Language Model.
Proceedings of the 19th International Conference on Computational Linguistics, 2002

2001
FBCC: An Image Similarity Algorithm Based on Regions.
Proceedings of the Advances in Multimedia Information Processing, 2001

Support vector machine learning for image retrieval.
Proceedings of the 2001 International Conference on Image Processing, 2001


  Loading...