R. Manmatha

Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

On the Scalability of Diffusion-based Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

No Head Left Behind - Multi-Head Alignment Distillation for Transformers.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

DocFormerv2: Local Features for Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

DocTr: Document Transformer for Structured Information Extraction in Documents.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

On Calibration of Scene-Text Recognition Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

GLASS: Global to Local Attention for Scene-Text Spotting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

YORO - Lightweight End to End Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LaTr: Layout-Aware Transformer for Scene-Text VQA.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ResNeSt: Split-Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

2021

Saliency Driven Perceptual Image Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

DocFormer: End-to-End Transformer for Document Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Sequence-to-Sequence Contrastive Learning for Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

A Comprehensive Study of Deep Video Action Recognition.

[BibT_eX]

[DOI]

Yi Zhu

Xinyu Li

Chunhui Liu

Mohammadreza Zolfaghari

CoRR, 2020

Document Visual Question Answering Challenge 2020.

[BibT_eX]

[DOI]

CoRR, 2020

DocVQA: A Dataset for VQA on Document Images.

[BibT_eX]

[DOI]

CoRR, 2020

Improving Semantic Segmentation via Self-Training.

[BibT_eX]

[DOI]

CoRR, 2020

Hierarchical Auto-Regressive Model for Image Compression Incorporating Object Saliency and a Deep Perceptual Loss.

[BibT_eX]

[DOI]

CoRR, 2020

SCATTER: Selective Context Attentional Scene Text Recognizer.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Dependence Models for Searching Text in Document Images.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2019

Human Perceptual Evaluations for Image Compression.

[BibT_eX]

[DOI]

CoRR, 2019

Deep Perceptual Compression.

[BibT_eX]

[DOI]

CoRR, 2019

Searching for Apparel Products from Images in the Wild.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Compressed Video Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Sampling Matters in Deep Embedding Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Image Annotation using Multi-scale Hypergraph Heat Diffusion Framework.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Efficient Exploration of Text Regions in Natural Scene Images Using Adaptive Image Sampling.

[BibT_eX]

[DOI]

Douglas Gray

Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Deep Decision Network for Multi-class Image Classification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

ICMR 2014: 4th ACM International Conference on Multimedia Retrieval.

[BibT_eX]

[DOI]

SIGIR Forum, 2015

Automatic Image Annotation using Deep Learning Representations.

[BibT_eX]

[DOI]

Venkatesh N. Murthy

Subhransu Maji

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

2014

Special issue on Multimedia Event Detection.

[BibT_eX]

[DOI]

Mach. Vis. Appl., 2014

Large scale document image retrieval by automatic word annotation.

[BibT_eX]

[DOI]

K. Pramod Sankar

Int. J. Document Anal. Recognit., 2014

Incorporating query-specific feedback into learning-to-rank models.

[BibT_eX]

[DOI]

W. Bruce Croft

Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

A Hybrid Model for Automatic Image Annotation.

[BibT_eX]

[DOI]

Venkatesh N. Murthy

Proceedings of the International Conference on Multimedia Retrieval, 2014

Modeling Concept Dependencies for Event Detection.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimedia Retrieval, 2014

Sequential Word Spotting in Historical Handwritten Documents.

[BibT_eX]

[DOI]

Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 2014

2013

SRI-Sarnoff AURORA System at TRECVID 2013 Multimedia Event Detection and Recounting.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Short Text Queries for Video Retrieval Multimedia event Detection at TRECVID 2013.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Creating an Improved Version Using Noisy OCR from Multiple Editions.

[BibT_eX]

[DOI]

David Wemhoener

Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

Formulating Action Recognition as a Ranking Problem.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013

Predicting retweet count using visual cues.

[BibT_eX]

[DOI]

Hüseyin Oktay

Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, 2013

2012

A Novel Word Spotting Method Based on Recurrent Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2012

SRI-Sarnoff AURORA System at TRECVID 2012 Multimedia Event Detection and Recounting.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Mubarak Shah

Subhabrata Bhattacharya

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Finding translations in scanned book collections.

[BibT_eX]

[DOI]

Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

A framework for manipulating and searching multiple retrieval types.

[BibT_eX]

[DOI]

Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

On Influence of Line Segmentation in Efficient Word Segmentation in Old Manuscripts.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, 2012

An Efficient Framework for Searching Text in Noisy Document Images.

[BibT_eX]

[DOI]

Proceedings of the 10th IAPR International Workshop on Document Analysis Systems, 2012

2011

Team SRI-Sarnoff's AURORA System @ TRECVID 2011.

[BibT_eX]

[DOI]

Alexander G. Hauptmann

Mubarak Shah

Subhabrata Bhattacharya

Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

A Fast Alignment Scheme for Automatic OCR Evaluation of Books.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011

BLSTM Neural Network Based Word Retrieval for Hindi Documents.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011

Partial duplicate detection for large book collections.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Mining relational structure from millions of books: position paper.

[BibT_eX]

[DOI]

David A. Smith

Proceedings of the 4th ACM Workshop on Online books, 2011

2010

Adapting BLSTM Neural Network Based Keyword Spotting Trained on Modern Data to Historical Documents.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2010

Nearest neighbor based collection OCR.

[BibT_eX]

[DOI]

K. Pramod Sankar

Proceedings of the Ninth IAPR International Workshop on Document Analysis Systems, 2010

Image retrieval using Markov Random Fields and global image features.

[BibT_eX]

[DOI]

Ainhoa Llorente

Stefan M. Rüger

Proceedings of the 9th ACM International Conference on Image and Video Retrieval, 2010

2009

Finding words in alphabet soup: Inference on freeform character recognition for historical scripts.

[BibT_eX]

[DOI]

Nicholas R. Howe

Pattern Recognit., 2009

Robust Recognition of Documents by Fusing Results of Word Clusters.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Document Analysis and Recognition, 2009

2008

Document Image Analysis and Recognition.

[BibT_eX]

[DOI]

Proceedings of the Wiley Encyclopedia of Computer Science and Engineering, 2008

Distributed image search in camera sensor networks.

[BibT_eX]

[DOI]

Tingxin Yan

Deepak Ganesan

Proceedings of the 6th International Conference on Embedded Networked Sensor Systems, 2008

A discrete direct retrieval model for image and video retrieval.

[BibT_eX]

[DOI]

Proceedings of the 7th ACM International Conference on Image and Video Retrieval, 2008

2007

Word spotting for historical documents.

[BibT_eX]

[DOI]

Int. J. Document Anal. Recognit., 2007

Further explorations in text alignment with handwritten documents.

[BibT_eX]

[DOI]

E. Micah Kornfield

Int. J. Document Anal. Recognit., 2007

Efficient Search in Document Image Collections.

[BibT_eX]

[DOI]

Anand Kumar

Proceedings of the Computer Vision, 2007

2006

A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2006

Exploring the Use of Conditional Random Field Models and HMMs for Historical Handwritten Document Recognition.

[BibT_eX]

[DOI]

Andrew McCallum

Proceedings of the Second International Workshop on Document Image Analysis for Libraries (DIAL 2006), 2006

Aligning Transcripts to Automatically Segmented Handwritten Manuscripts.

[BibT_eX]

[DOI]

Jamie L. Rothfeder

Proceedings of the Document Analysis Systems VII, 7th International Workshop, 2006

2005

Multimedia information retrieval: workshop report.

[BibT_eX]

[DOI]

Stefan M. Rüger

Alexander G. Hauptmann

SIGIR Forum, 2005

A Scale Space Approach for Automatically Segmenting Words from Historical Handwritten Documents.

[BibT_eX]

[DOI]

Jamie L. Rothfeder

IEEE Trans. Pattern Anal. Mach. Intell., 2005

Boosted decision trees for word recognition in handwritten document retrieval.

[BibT_eX]

[DOI]

Nicholas R. Howe

Proceedings of the SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005

Joint visual-text modeling for automatic retrieval of multimedia documents.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM International Conference on Multimedia, 2005

Classification Models for Historical Manuscript Recognition.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), 29 August, 2005

Combining text and audio-visual features in video indexing.

[BibT_eX]

[DOI]

Shih-Fu Chang

Tat-Seng Chua

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Learning Shapes for Image Classification and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Image and Video Retrieval, 4th International Conference, 2005

2004

A search engine for historical manuscript images.

[BibT_eX]

[DOI]

Proceedings of the SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004

Statistical models for automatic video annotation and retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Holistic Word Recognition for Handwritten Historical Documents.

[BibT_eX]

[DOI]

Proceedings of the 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), 2004

Text Alignment with Handwritten Documents.

[BibT_eX]

[DOI]

E. Micah Kornfield

Proceedings of the 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), 2004

Multiple Bernoulli Relevance Models for Image and Video Annotation.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June, 2004

An Inference Network Approach to Image Retrieval.

[BibT_eX]

[DOI]

Donald Metzler

Proceedings of the Image and Video Retrieval: Third International Conference, 2004

Using Maximum Entropy for Automatic Image Annotation.

[BibT_eX]

[DOI]

Jiwoon Jeon

Proceedings of the Image and Video Retrieval: Third International Conference, 2004

2003

Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002.

[BibT_eX]

[DOI]

SIGIR Forum, 2003

Automatic image annotation and retrieval using cross-media relevance models.

[BibT_eX]

[DOI]

Jiwoon Jeon

Proceedings of the SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28, 2003

A Model for Learning the Semantics of Pictures.

[BibT_eX]

[DOI]

Jiwoon Jeon

Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Mobile Distributed Information Retrieval for Highly-Partitioned Networks.

[BibT_eX]

[DOI]

Katrina M. Hanna

Brian Neil Levine

Proceedings of the 11th IEEE International Conference on Network Protocols (ICNP 2003), 2003

Features for Word Spotting in Historical Manuscripts.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2003

Word Image Matching Using Dynamic Time Warping.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 2003

2002

A critical examination of TDT's cost function.

[BibT_eX]

[DOI]

Ao Feng

Proceedings of the SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002

2001

Modeling Score Distributions for Combining the Outputs of Search Engines.

[BibT_eX]

[DOI]

Fangfang Feng

Proceedings of the SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001

Automatic Segmentation and Indexing in a Database of Bird Images.

[BibT_eX]

[DOI]

Madirakshi Das

Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7-14, 2001, 2001

1999

Indexing and Retrieval, SIGIR'99 Workshop Summary.

[BibT_eX]

[DOI]

SIGIR Forum, 1999

TextFinder: An Automatic System to Detect and Recognize Text In Images.

[BibT_eX]

[DOI]

Victor Wu

IEEE Trans. Pattern Anal. Mach. Intell., 1999

Indexing Flower Patent Images Using Domain Knowledge.

[BibT_eX]

[DOI]

Madirakshi Das

IEEE Intell. Syst., 1999

Scale Space Technique for Word Segmentation in Handwritten Documents.

[BibT_eX]

[DOI]

Nitin Srimal

Proceedings of the Scale-Space Theories in Computer Vision, 1999

1998

Multimedia Indexing and Retrieval, Summary Report.

[BibT_eX]

[DOI]

SIGIR Forum, 1998

On computing global similarity in images.

[BibT_eX]

[DOI]

Srinivas Ravela

Proceedings of the Proceedings Fourth IEEE Workshop on Applications of Computer Vision, 1998

Indexing flowers by color names using domain knowledge-driven segmentation.

[BibT_eX]

[DOI]

Madirakshi Das

Proceedings of the Proceedings Fourth IEEE Workshop on Applications of Computer Vision, 1998

Retrieving Images by Appearance.

[BibT_eX]

[DOI]

Srinivas Ravela

Proceedings of the Sixth International Conference on Computer Vision (ICCV-98), 1998

Computing local and global similarity in images.

[BibT_eX]

[DOI]

S. Chandu Ravela

Y. Chitti

Proceedings of the Human Vision and Electronic Imaging III, 1998

Document image cleanup and binarization.

[BibT_eX]

[DOI]

Victor Wu

Proceedings of the Document Recognition V, San Jose, CA, USA, January 24, 1998, 1998

1997

Image Retrieval by Appearance.

[BibT_eX]

[DOI]

Srinivas Ravela

Proceedings of the SIGIR '97: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1997

Syntactic characterization of appearance and its application to image retrieval.

[BibT_eX]

[DOI]

S. Chandu Ravela

Proceedings of the Human Vision and Electronic Imaging II, 1997

Finding Text in Images.

[BibT_eX]

[DOI]

Victor Wu