Joan-Andreu Sánchez

Orcid: 0000-0003-0423-2020

According to our database1, Joan-Andreu Sánchez authored at least 113 papers between 1996 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



Ground-truth generation through crowdsourcing with probabilistic indexes.
Neural Comput. Appl., October, 2024

Enhancing Recognition of Historical Musical Pieces with Synthetic and Composed Images.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

Improving Efficiency and Performance Through CTC-Based Transformers for Mathematical Expression Recognition.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

Speed-Up Pre-trained Vision Encoder-Decoder Transformers by Leveraging Lightweight Mixer Layers for Text Recognition.
Proceedings of the Document Analysis Systems - 16th IAPR International Workshop, 2024

Discriminative estimation of probabilistic context-free grammars for mathematical expression recognition and retrieval.
Pattern Anal. Appl., November, 2023

Information extraction in handwritten historical logbooks.
Pattern Recognit. Lett., August, 2023

The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions.
Pattern Recognit. Lett., August, 2023

Processing a large collection of historical tabular images.
Pattern Recognit. Lett., 2023

Py4MER: A CTC-Based Mathematical Expression Recognition System.
Proceedings of the Pattern Recognition and Image Analysis - 11th Iberian Conference, 2023

Synchronous Recognition of Music Images Using Coupled N-Gram Models.
Proceedings of the ACM Symposium on Document Engineering 2023, 2023

Information Extraction in Handwritten Historical Logbooks.
Dataset, July, 2022

Extracting Descriptive Words from Untranscribed Handwritten Images.
Proceedings of the Pattern Recognition and Image Analysis - 10th Iberian Conference, 2022

Discriminative Learning of Two-Dimensional Probabilistic Context-Free Grammars for Mathematical Expression Recognition and Retrieval.
Proceedings of the Pattern Recognition and Image Analysis - 10th Iberian Conference, 2022

Effective Crowdsourcing in the EDT Project with Probabilistic Indexes.
Proceedings of the Document Analysis Systems - 15th IAPR International Workshop, 2022

Information Extraction from Handwritten Tables in Historical Documents.
Proceedings of the Document Analysis Systems - 15th IAPR International Workshop, 2022

Discriminative Learning for Probabilistic Context-Free Grammars based on Generalized H-Criterion.
CoRR, 2021

Reducing the Human Effort in Text Line Segmentation for Historical Documents.
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021

ICDAR 2021 Competition on Mathematical Formula Detection.
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021

Computation of moments for probabilistic finite-state automata.
Inf. Sci., 2020

Generation of Hypergraphs from the N-Best Parsing of 2D-Probabilistic Context-Free Grammars for Mathematical Expression Recognition.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

The HisClima database: historical weather logs for automatic transcription and information extraction.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Handwritten Music Recognition Improvement through Language Model Re-interpretation for Mensural Notation.
Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition, 2020

The Carabela Project and Manuscript Collection: Large-Scale Probabilistic Indexing and Content-based Classification.
Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition, 2020

Two Semi-Supervised Training Approaches for Automated Text Recognition.
Proceedings of the 17th International Conference on Frontiers in Handwriting Recognition, 2020

A set of benchmarks for Handwritten Text Recognition on historical documents.
Pattern Recognit., 2019

Transforming scholarship in the archives through handwritten text recognition.
J. Documentation, 2019

A Study of English Neologisms Through Large-Scale Probabilistic Indexing of Bentham's Manuscripts.
Proceedings of the New Trends in Image Analysis and Processing - ICIAP 2019, 2019

Modern vs Diplomatic Transcripts for Historical Handwritten Text Recognition.
Proceedings of the New Trends in Image Analysis and Processing - ICIAP 2019, 2019

Making Two Vast Historical Manuscript Collections Searchable and Extracting Meaningful Textual Features Through Large-Scale Probabilistic Indexing.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Music Symbol Sequence Indexing in Medieval Plainchant Manuscripts.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Information Extraction in Handwritten Marriage Licenses Books.
Proceedings of the 5th International Workshop on Historical Document Imaging and Processing, 2019

On the Derivational Entropy of Left-to-Right Probabilistic Finite-State Automata and Hidden Markov Models.
Comput. Linguistics, 2018

Empirical Evaluation of Variational Autoencoders for Data Augmentation.
Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018), 2018

Active Learning in Handwritten Text Recognition using the Derivational Entropy.
Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition, 2018

Automatic Alignment of Handwritten Images and Transcripts for Training Handwritten Text Recognition Systems.
Proceedings of the 13th IAPR International Workshop on Document Analysis Systems, 2018

ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

ICDAR2017 Competition on Information Extraction in Historical Handwritten Records.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

Information Extraction in Handwritten Marriage Licenses Books Using the MGGI Methodology.
Proceedings of the Pattern Recognition and Image Analysis - 8th Iberian Conference, 2017

A Historical Document Handwriting Transcription End-to-end System.
Proceedings of the Pattern Recognition and Image Analysis - 8th Iberian Conference, 2017

An integrated grammar-based approach for mathematical expression recognition.
Pattern Recognit., 2016

ICFHR2016 Competition on Handwritten Text Recognition on the READ Dataset.
Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

Handwritten Text Recognition for Bengali.
Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

Using the MGGI Methodology for Category-Based Language Modeling in Handwritten Marriage Licenses Books.
Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

Handwriting Transcription and Keyword Spotting in Historical Daily Records Documents.
Proceedings of the 12th IAPR Workshop on Document Analysis Systems, 2016

Overview of the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task.
Proceedings of the Working Notes of CLEF 2016, 2016

Structure detection and segmentation of documents using 2D stochastic context-free grammars.
Neurocomputing, 2015

Optical modelling and language modelling trade-off for Handwritten Text Recognition.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Crossing the lines: making optimal use of context in line-based Handwritten Text Recognition.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

ICDAR 2015 competition HTRtS: Handwritten Text Recognition on the tranScriptorium dataset.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

Influence of text line segmentation in Handwritten Text Recognition.
Proceedings of the 13th International Conference on Document Analysis and Recognition, 2015

On the Modification of Binarization Algorithms to Retain Grayscale Information for Handwritten Text Recognition.
Proceedings of the Pattern Recognition and Image Analysis - 7th Iberian Conference, 2015

Automatisierte Handschriftenerkennung mit der Transcription & Recognition Platform (TRP).
Proceedings of the 2. Tagung des Verbands Digital Humanities im deutschsprachigen Raum, 2015

Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models.
Pattern Recognit. Lett., 2014

Offline Features for Classifying Handwritten Math Symbols with Recurrent Neural Networks.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS).
Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition, 2014

Handwritten text recognition for historical documents in the transcriptorium project.
Proceedings of the Digital Access to Textual Cultural Heritage 2014, 2014

Ground-Truth Production in the Transcriptorium Project.
Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, 2014

The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition.
Pattern Recognit., 2013

Towards the Supervised Machine Translation: Real Word Alignments and Translations in a Multi-task Active Learning process.
Proceedings of Machine Translation Summit XIV: Posters, 2013

Human Evaluation of the Transcription Process of a Marriage License Book.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

Category-Based Language Models for Handwriting Recognition of Marriage License Books.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

Classification of On-Line Mathematical Symbols with Hybrid Features and Recurrent Neural Networks.
Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013

Multimodal Interactive Parsing.
Proceedings of the Pattern Recognition and Image Analysis - 6th Iberian Conference, 2013

An Image-Based Measure for Evaluation of Mathematical Expression Recognition.
Proceedings of the Pattern Recognition and Image Analysis - 6th Iberian Conference, 2013

Page Segmentation of Structured Documents Using 2D Stochastic Context-Free Grammars.
Proceedings of the Pattern Recognition and Image Analysis - 6th Iberian Conference, 2013

tranScriptorium: a european project on handwritten text recognition.
Proceedings of the ACM Symposium on Document Engineering 2013, 2013

Evaluating a post-editing approach for handwriting transcription.
Proceedings of the 11th Conference on Natural Language Processing, 2012

Unbiased Evaluation of Handwritten Mathematical Expression Recognition.
Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, 2012

Translating the Penn Treebank with an Interactive-Predictive MT System.
Int. J. Comput. Linguistics Appl., 2011

Multimodal Computer-Assisted Transcription of Ancient Documents.
ERCIM News, 2011

Multimodal Interactive Transcription of Ancient Text Images.
Proceedings of the Multimedia for Cultural Heritage - First International Workshop, 2011

Handwritten Text Recognition for Marriage Register Books.
Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011

Recognition of Printed Mathematical Expressions Using Two-Dimensional Stochastic Context-Free Grammars.
Proceedings of the 2011 International Conference on Document Analysis and Recognition, 2011

Interactive Predictive Parsing Framework for the Spanish Language.
Proces. del Leng. Natural, 2010

Handwritten Text Recognition for Ancient Documents.
Proceedings of the First Workshop on Applications of Pattern Analysis, 2010

UPV-PRHLT English-Spanish System for WMT10.
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, 2010

The UPV-PRHLT Combination System for WMT 2010.
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, 2010

Complete Search Space Exploration for SITG Inside Probability.
Proceedings of the Structural, 2010

Interactive Predictive Parsing using a Web-based Architecture.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 2, 2010, Los Angeles, California, USA, 2010

Enlarged Search Space for SITG Parsing.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

ITI-UPV system description for IWSLT 2010.
Proceedings of the 2010 International Workshop on Spoken Language Translation, 2010

Comparing Several Techniques for Offline Recognition of Printed Mathematical Symbols.
Proceedings of the 20th International Conference on Pattern Recognition, 2010

IPP-Ann: An Interactive Tool for Probabilistic Parsing.
Proceedings of the Database and Expert Systems Applications, 2010

Confidence Measures for Error Discrimination in an Interactive Predictive Parsing Framework.
Proceedings of the COLING 2010, 2010

Syntax Augmented Inversion Transduction Grammars for Machine Translation.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2010

Statistical Confidence Measures for Probabilistic Parsing.
Proceedings of the Recent Advances in Natural Language Processing, 2009

UPV translation system for IWSLT 2009.
Proceedings of the 2009 International Workshop on Spoken Language Translation, 2009

Interactive Predictive Parsing.
Proceedings of the 11th International Workshop on Parsing Technologies (IWPT-2009), 2009

Using Parsed Corpora for Estimating Stochastic Inversion Transduction Grammars.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Part-of-Speech Tagging Based on Machine Translation Techniques.
Proceedings of the Pattern Recognition and Image Analysis, Third Iberian Conference, 2007

Fast Stochastic Context-Free Parsing: A Stochastic Version of the Valiant Algorithm.
Proceedings of the Pattern Recognition and Image Analysis, Third Iberian Conference, 2007

Stochastic Inversion Transduction Grammars for Obtaining Word Phrases for Phrase-based Statistical Machine Translation.
Proceedings of the Proceedings on the Workshop on Statistical Machine Translation, 2006

Obtaining Word Phrases with Stochastic Inversion Translation Grammars for Phrase-based Statistical Machine Translation.
Proceedings of the 11th Annual conference of the European Association for Machine Translation, 2006

Estimation of stochastic context-free grammars and their use as language models.
Comput. Speech Lang., 2005

A Hybrid Approach to Statistical Language Modeling with Multilayer Perceptrons and Unigrams.
Proceedings of the Text, Speech and Dialogue, 8th International Conference, 2005

Performance of a SCFG-Based Language Model with Training Data Sets of Increasing Size.
Proceedings of the Pattern Recognition and Image Analysis, Second Iberian Conference, 2005

Time Reduction of Stochastic Parsing with Stochastic Context-Free Grammars.
Proceedings of the Pattern Recognition and Image Analysis, Second Iberian Conference, 2005

A hybrid language model based on a combination of N-grams and stochastic context-free grammars.
ACM Trans. Asian Lang. Inf. Process., 2004

Early-based stochastic context-free grammar estimation from bracketed corpora and its use in a hybrid language model.
Proces. del Leng. Natural, 2003

La plataforma de adquisición de diálogos en el proyecto Dihana.
Proces. del Leng. Natural, 2003

Learning of Stochastic Context-Free Grammars by Means of Estimation Algorithms and Initial Treebank Grammars.
Proceedings of the Pattern Recognition and Image Analysis, First Iberian Conference, 2003

Performance and Improvements of a Language Model Based on Stochastic Context-Free Grammars.
Proceedings of the Pattern Recognition and Image Analysis, First Iberian Conference, 2003

A Hybrid Language Model based on Stochastic Context-free Grammars.
Proceedings of the Workshop and Tutorial on Learning Contex-Free Grammars, 2003

Combination of Estimation Algorithms and Grammatical Inference Techniques to Learn Stochastic Context-Free Grammars.
Proceedings of the Grammatical Inference: Algorithms and Applications, 2000

Combination Of N-Grams And Stochastic Context-Free Grammars For Language Modeling.
Proceedings of the COLING 2000, 18th International Conference on Computational Linguistics, Proceedings of the Conference, 2 Volumes, July 31, 2000

Learning of stochastic context-free grammars by means of estimation algorithms.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

A fast version of the atros system.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Acoustic and syntactical modeling in the ATROS system.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

Estimation of the probability distributions of stochastic context-free grammars from the k-best derivations.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Consistency of Stochastic Context-Free Grammars From Probabilistic Estimation Based on Growth Transformations.
IEEE Trans. Pattern Anal. Mach. Intell., 1997

Comparison Between the Inside-Outside Algorithm and the Viterbi Algorithm for Stochastic Context-Free Grammars.
Proceedings of the Advances in Structural and Syntactical Pattern Recognition, 1996
