Dimosthenis Karatzas

Orcid: 0000-0001-8762-4454

Affiliations:
  • Universitat Autónoma de Barcelona, Spain


According to our database1, Dimosthenis Karatzas authored at least 166 papers between 2002 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
ComiCap: A VLMs pipeline for dense captioning of Comic Panels.
CoRR, 2024

One missing piece in Vision and Language: A Survey on Comics Understanding.
CoRR, 2024

CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding.
CoRR, 2024

Retrieval Augmented Verification: Unveiling Disinformation with Structured Representations for Zero-Shot Real-Time Evidence-guided Fact-Checking of Multi-modal Social media posts.
CoRR, 2024

STEP - Towards Structured Scene-Text Spotting.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Counting the Corner Cases: Revisiting Robust Reading Challenge Data Sets, Evaluation Protocols, and Metrics.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

Comics Datasets Framework: Mix of Comics Datasets for Detection Benchmarking.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 Workshops, 2024

Multimodal Transformer for Comics Text-Cloze.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

Privacy-Aware Document Visual Question Answering.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

Federated Document Visual Question Answering: A Pilot Study.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

Multi-page Document Visual Question Answering Using Self-attention Scoring Mechanism.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

Machine Unlearning for Document Classification.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

GRIF-DM: Generation of Rich Impression Fonts Using Diffusion Models.
Proceedings of the ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain, 2024

Image-Text Matching for Large-Scale Book Collections.
Proceedings of the Document Analysis Systems - 16th IAPR International Workshop, 2024

Multi-page Document VQA with Recurrent Memory Transformer.
Proceedings of the Document Analysis Systems - 16th IAPR International Workshop, 2024

2023
Hierarchical multimodal transformers for Multipage DocVQA.
Pattern Recognit., December, 2023

Privacy-Aware Document Visual Question Answering.
CoRR, 2023

Reading Between the Lanes: Text VideoQA on the Road.
CoRR, 2023

ICDAR 2023 Video Text Reading Competition for Dense and Small Text.
CoRR, 2023

Watching the News: Towards VideoQA Models that can Read.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

ICDAR 2023 Competition on Reading the Seal Title.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

ICDAR 2023 Competition on Video Text Reading for Dense and Small Text.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Reading Between the Lanes: Text VideoQA on the Road.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

DocILE Benchmark for Document Information Localization and Extraction.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Accelerating Transformer-Based Scene Text Detection and Recognition via Token Pruning.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Extended Overview of DocILE 2023: Document Information Localization and Extraction.
Proceedings of the Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), 2023

Overview of DocILE 2023: Document Information Localization and Extraction.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2023

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoder for Text Recognition and Document Enhancement.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Hierarchical multimodal transformers for Multi-Page DocVQA.
CoRR, 2022

Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement.
CoRR, 2022

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

InfographicVQA.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

MUST-VQA: MUltilingual Scene-Text VQA.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Out-of-Vocabulary Challenge Report.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

OCR-IDL: OCR Annotations for Industry Document Library Dataset.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

A Multilingual Approach to Scene Text Visual Question Answering.
Proceedings of the Document Analysis Systems - 15th IAPR International Workshop, 2022

Read While You Drive - Multilingual Text Tracking on the Road.
Proceedings of the Document Analysis Systems - 15th IAPR International Workshop, 2022

2021
Multimodal grid features and cell pointers for scene text visual question answering.
Pattern Recognit. Lett., 2021

Real-time Lexicon-free Scene Text Retrieval.
Pattern Recognit., 2021

Asking questions on handwritten document collections.
Int. J. Document Anal. Recognit., 2021

ICDAR 2021 Competition on Document VisualQuestion Answering.
CoRR, 2021

DocVQA: A Dataset for VQA on Document Images.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

StacMR: Scene-Text Aware Cross-Modal Retrieval.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

ICDAR 2021 Competition on Document Visual Question Answering.
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021

Document Collection Visual Question Answering.
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021

2020
Document Visual Question Answering Challenge 2020.
CoRR, 2020

DocVQA: A Dataset for VQA on Document Images.
CoRR, 2020

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Exploring Hate Speech Detection in Multimodal Publications.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Retrieval Guided Unsupervised Multi-domain Image to Image Translation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

RoadText-1K: Text Detection & Recognition Dataset for Driving Videos.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Text Recognition - Real World Data and Where to Find Them.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Location Sensitive Image Retrieval and Tagging.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
FAST: Facilitated and Accurate Scene Text Proposals through FCN Guided Pruning.
Pattern Recognit. Lett., 2019

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard.
CoRR, 2019

Self-Supervised Learning from Web Data for Multimodal Retrieval.
CoRR, 2019

Self-Supervised Visual Representations for Cross-Modal Retrieval.
Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition - RRC-MLT-2019.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Eye-Movements During Information Extraction from Administrative Documents.
Proceedings of the 2nd International Workshop on Human-Document Interaction, 2019

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Selective Style Transfer for Text.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

ICDAR 2019 Competition on Scene Text Visual Question Answering.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Can One Deep Learning Model Learn Script-Independent Multilingual Word-Spotting?
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Scene Text Visual Question Answering.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Good News, Everyone! Context Driven Entity-Aware Captioning for News Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images.
CoRR, 2018

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces.
CoRR, 2018

Non-deterministic Behavior of Ranking-Based Metrics When Evaluating Embeddings.
Proceedings of the Reproducible Research in Pattern Recognition, 2018

On the Labeling Correctness in Computer Vision Datasets.
Proceedings of the Workshop on Interactive Adaptive Learning co-located with European Conference on Machine Learning (ECML 2018) and Principles and Practice of Knowledge Discovery in Databases (PKDD 2018), 2018

Single Shot Scene Text Retrieval.
Proceedings of the Computer Vision - ECCV 2018, 2018

Learning from #Barcelona Instagram Data What Locals and Tourists Post About Its Neighbourhoods.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Learning to Learn from Web Data Through Deep Semantic Embeddings.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

The Robust Reading Competition Annotation and Evaluation Platform.
Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 2018

Cutting Sayre's Knot: Reading Scene Text without Segmentation. Application to Utility Meters.
Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 2018

Word Spotting in Scene Images Based on Character Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

2017
TextProposals: A text-specific selective search algorithm for word spotting in the wild.
Pattern Recognit., 2017

Improving patch-based scene text script identification with ensembles of conjoined networks.
Pattern Recognit., 2017

The Robust Reading Competition Annotation and Evaluation Platform.
CoRR, 2017

Improving Text Proposals for Scene Images with Fully Convolutional Networks.
CoRR, 2017

ICDAR2017 Robust Reading Challenge on Text Extraction from Biomedical Literature Figures (DeTEXT).
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

ICDAR2017 Robust Reading Challenge on Omnidirectional Video.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

ICDAR2017 Robust Reading Challenge on COCO-Text.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

LSDE: Levenshtein Space Deep Embedding for Query-by-String Word Spotting.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

Reading Text in the Wild from Compressed Images.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Self-Supervised Learning of Visual Features through Embedding Images into Text Topic Spaces.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
A fast hierarchical method for multi-script and arbitrary oriented scene text extraction.
Int. J. Document Anal. Recognit., 2016

Boosting patch-based scene text script identification with ensembles of conjoined networks.
CoRR, 2016

Dynamic Lexicon Generation for Natural Scene Images.
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Visual Script and Language Identification.
Proceedings of the 12th IAPR Workshop on Document Analysis Systems, 2016

Human-Document Interaction Systems - A New Frontier for Document Image Analysis.
Proceedings of the 12th IAPR Workshop on Document Analysis Systems, 2016

A Fine-Grained Approach to Scene Text Script Identification.
Proceedings of the 12th IAPR Workshop on Document Analysis Systems, 2016

2015
Preface.
Int. J. Document Anal. Recognit., 2015

Knowledge-driven understanding of images in comic books.
Int. J. Document Anal. Recognit., 2015

Automatic Verification of Properly Signed Multi-page Document Images.
Proceedings of the Advances in Visual Computing - 11th International Symposium, 2015

Advancing Physics Learning Through Traversing a Multi-Modal Experimentation Space.
Proceedings of the Workshop Proceedings of the 11th International Conference on Intelligent Environments, 2015

Sparse radial sampling LBP for writer identification.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

ICDAR 2015 competition on Robust Reading.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Efficient indexing for Query By String text retrieval.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Novel line verification for multiple instance focused retrieval in document collections.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Object proposals for text extraction in the wild.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

2014
Logo and Trademark Recognition.
Proceedings of the Handbook of Document Image Processing and Recognition, 2014

Multimodal page classification in administrative document image streams.
Int. J. Document Anal. Recognit., 2014

Limitations of visual gamma corrections in LCD displays.
Displays, 2014

Modelling Task-Dependent Eye Guidance to Objects in Pictures.
Cogn. Comput., 2014

MSER-Based Real-Time Text Detection and Tracking.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Embedding Document Structure to Bag-of-Words through Pair-wise Stable Key-Regions.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Fast structural matching for document image retrieval through spatial databases.
Proceedings of the Document Recognition and Retrieval XXI, 2014

Color Descriptor for Content-Based Drawing Retrieval.
Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 2014

An On-line Platform for Ground Truthing and Performance Evaluation of Text Extraction Systems.
Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 2014

A Cache Language Model for Whole Document Handwriting Recognition.
Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 2014

Scene Text Recognition: No Country for Old Men?
Proceedings of the Computer Vision - ACCV 2014 Workshops, 2014

2013
Automatic Text Localisation in Scanned Comic Books.
Proceedings of the VISAPP 2013, 2013

An Interactive Appearance-based Document Retrieval System for Historical Newspapers.
Proceedings of the VISAPP 2013, 2013

Towards multispectral data acquisition with hand-held devices.
Proceedings of the IEEE International Conference on Image Processing, 2013

An Active Contour Model for Speech Balloon Detection in Comics.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

ICDAR 2013 Robust Reading Competition.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

Document Classification and Page Stream Segmentation for Digital Mailroom Applications.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

Multi-script Text Extraction from Natural Scenes.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

Key-Region Detection for Document Images - Application to Administrative Document Retrieval.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

Towards Modelling an Attention-Based Text Localization Process.
Proceedings of the Pattern Recognition and Image Analysis - 6th Iberian Conference, 2013

Spotting Graphical Symbols in Camera-Acquired Documents in Real Time.
Proceedings of the Graphics Recognition. Current Trends and Challenges, 2013

Adaptive Contour Classification of Comics Speech Balloons.
Proceedings of the Graphics Recognition. Current Trends and Challenges, 2013

2012
Multipage document retrieval by textual and visual representations.
Proceedings of the 21st International Conference on Pattern Recognition, 2012

CVC-UAB's Participation in the Flowchart Recognition Task of CLEF-IP 2012.
Proceedings of the CLEF 2012 Evaluation Labs and Workshop, 2012

2011
Report from the AND 2009 working group on noisy text datasets.
Int. J. Document Anal. Recognit., 2011

Visual gamma correction for LCD displays.
Displays, 2011

A generic framework for median graph computation based on a recursive embedding approach.
Comput. Vis. Image Underst., 2011

Locating Unique Hues under mixed illumination conditions in CIECAM02.
Proceedings of the 19th Color and Imaging Conference, 2011

ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in Born-Digital Images (Web and Email).
Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011

Classification of Administrative Document Images by Logo Identification.
Proceedings of the Graphics Recognition. New Trends and Challenges, 2011

Interactive Trademark Image Retrieval by Fusing Semantic and Visual Content.
Proceedings of the Advances in Information Retrieval, 2011

2010
Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model.
Int. J. Document Anal. Recognit., 2010

Generation of synthetic documents for performance evaluation of symbol recognition & spotting systems.
Int. J. Document Anal. Recognit., 2010

Perceptual Image Retrieval by Adding Color Information to the Shape Context Descriptor.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

A polar-based logo representation based on topological and colour features.
Proceedings of the Ninth IAPR International Workshop on Document Analysis Systems, 2010

A framework for the assessment of text extraction algorithms on complex colour images.
Proceedings of the Ninth IAPR International Workshop on Document Analysis Systems, 2010

2009
Text Segmentation in Colour Posters from the Spanish Civil War Era.
Proceedings of the 10th International Conference on Document Analysis and Recognition, 2009

A Recursive Embedding Approach to Median Graph Computation.
Proceedings of the Graph-Based Representations in Pattern Recognition, 2009

2008
A Generic Architecture for the Conversion of Document Collections into Semantically Annotated Digital Archives.
J. Univers. Comput. Sci., 2008

Segmentation robust to the vignette effect for machine vision systems.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

HistoSketch: A Semi-Automatic Annotation Tool for Archival Documents.
Proceedings of the Eighth IAPR International Workshop on Document Analysis Systems, 2008

Detecting Gradients in Text Images Using the Hough Transform.
Proceedings of the Eighth IAPR International Workshop on Document Analysis Systems, 2008

2007
Colour text segmentation in web images based on human perception.
Image Vis. Comput., 2007

2006
Ground Truth for Layout Analysis Performance Evaluation.
Proceedings of the Document Analysis Systems VII, 7th International Workshop, 2006

2005
Semantics-Based Content Extraction in Typewritten Historical Documents.
Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), 29 August, 2005

A display calibration technique based on invariant human colour mechanisms.
Proceedings of the 2nd Symposium on Applied Perception in Graphics and Visualization, 2005

2004
Text Extraction from Web Images Based on A Split-and-Merge Segmentation Method Using Colour Perception.
Proceedings of the 17th International Conference on Pattern Recognition, 2004

The lifecycle of a digital historical document: structure and content.
Proceedings of the 2004 ACM Symposium on Document Engineering, 2004

Document Image Analysis for World War II Personal Records.
Proceedings of the 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), 2004

A Complete Approach to the Conversion of Typewritten Historical Documents for Digital Archives.
Proceedings of the Document Analysis Systems VI, 6th International Workshop, 2004

2003
Text segmentation in web images using colour perception and topological features.
PhD thesis, 2003

Two Approaches for Text Segmentation in Web Images.
Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2003

ICDAR 2003 Page Segmentation Competition.
Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2003

A fuzzy Approach to Text Segmentation in Web Images based on Human Colour perception.
Proceedings of the Web Document Analysis, 2003

2002
Fuzzy Segmentation of Characters in Web Images Based on Human Colour Perception.
Proceedings of the Document Analysis Systems V, 5th International Workshop, 2002


  Loading...