Paolo Merialdo

Orcid: 0000-0002-3852-8092

  • Roma Tre University, Rome, Italy

According to our database1, Paolo Merialdo authored at least 105 papers between 1989 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not.
CoRR, 2024

Clustering Amendments with Semantic Embeddings.
Proceedings of the 32nd Symposium of Advanced Database Systems, 2024

Experiences and Lessons Learned from the SIGMOD Entity Resolution Programming Contests.
SIGMOD Rec., June, 2023

Fine-grained semantic type discovery for heterogeneous sources using clustering.
VLDB J., March, 2023

Enhancing Accessibility of Parliamentary Video Streams: AI-Based Automatic Indexing Using Verbatim Reports.
Proceedings of the 1st Legal Information Retrieval meets Artificial Intelligence Workshop LIRAI 2023 co-located with the 34th ACM Hypertext Conference HT 2023, 2023

CERTEM: Explaining and Debugging Black-box Entity Resolution Systems with CERTA.
Proc. VLDB Endow., 2022

Kelpie: an Explainability Framework for Embedding-based Link Prediction Models.
Proc. VLDB Endow., 2022

Self-supervised learning for medieval handwriting identification: A case study from the Vatican Apostolic Library.
Inf. Process. Manag., 2022

Explaining Link Prediction Systems based on Knowledge Graph Embeddings.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

OpenTRIAGE: Entity Linkage for Detail Webpages.
Proceedings of the 30th Italian Symposium on Advanced Database Systems, 2022

Explaining Link Prediction with Kelpie.
Proceedings of the 30th Italian Symposium on Advanced Database Systems, 2022

Effective Explanations for Entity Resolution Models.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Multi-Label Classification of Bills from the Italian Senate.
Proceedings of 1st Workshop on AI for Public Administration co-located with 21st International Conference of the Italian Association for Artificial Intelligence (AIxIA 2022), Udine, Italy, November 28, 2022

Knowledge Graph Embedding for Link Prediction: A Comparative Analysis.
ACM Trans. Knowl. Discov. Data, 2021

In Codice Ratio: A crowd-enabled solution for low resource machine transcription of the Vatican Registers.
Inf. Process. Manag., 2021

Alaska: A Flexible Benchmark for Data Integration Tasks.
CoRR, 2021

NOAH: Creating Data Integration Pipelines over Continuously Extracted Web Data.
Proceedings of the Workshops of the EDBT/ICDT 2021 Joint Conference, 2021

Knowledge Graph Embeddings or Bias Graph Embeddings? A Study of Bias in Link Prediction Models.
Proceedings of the Workshop on Deep Learning for Knowledge Graphs (DL4KG 2021) co-located with the 20th International Semantic Web Conference (ISWC 2021), 2021

Crowdsourcing for Building Knowledge Graphs at Scale from the Vatican Archives.
Proceedings of the 28th Italian Symposium on Advanced Database Systems, 2020

Hybrid Crowd-Machine Wrapper Inference.
ACM Trans. Knowl. Discov. Data, 2019

Interpreting deep learning models for entity resolution: an experience report using LIME.
Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, 2019

In Codice Ratio: Machine Transcription of Medieval Manuscripts.
Proceedings of the Digital Libraries: Supporting Open Science, 2019

Multikernel Activation Functions: Formulation and a Case Study.
Proceedings of the Recent Advances in Big Data and Deep Learning, 2019

Big Data Integration for Product Specifications.
IEEE Data Eng. Bull., 2018

Towards Annotating Relational Data on the Web with Language Models.
Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018

Leveraging Wikipedia Table Schemas for Knowledge Graph Augmentation.
Proceedings of the 21st International Workshop on the Web and Databases, 2018

Big Data Linkage for Product Specification Pages.
Proceedings of the 2018 International Conference on Management of Data, 2018

Lessons Learned and Research Agenda for Big Data Integration of Product Specifications.
Proceedings of the 26th Italian Symposium on Advanced Database Systems, 2018

Towards Knowledge Discovery from the Vatican Secret Archives. In Codice Ratio - Episode 1: Machine Transcription of the Manuscripts.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Crowdsourcing for data management.
Knowl. Inf. Syst., 2017

In Codice Ratio: Scalable Transcription of Vatican Registers.
ERCIM News, 2017

In Codice Ratio: Scalable Transcription of Historical Handwritten Documents.
Proceedings of the 25th Italian Symposium on Advanced Database Systems, 2017

In Codice Ratio: OCR of Handwritten Latin Documents using Deep Convolutional Networks.
Proceedings of the 11th International Workshop on Artificial Intelligence for Cultural Heritage co-located with the 16th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2017), 2017

Accurate fact harvesting from natural language text in wikipedia with Lector.
Proceedings of the 19th International Workshop on Web and Databases, 2016

Web Content Extraction: a MetaAnalysis of its Past and Thoughts on its Future.
SIGKDD Explor., 2015

Crowdsourcing large scale wrapper inference.
Distributed Parallel Databases, 2015

Web Content Extraction - a Meta-Analysis of its Past and Thoughts on its Future.
CoRR, 2015

The Startup Ecosystem: a Quick Tour.
Proceedings of the 23rd Italian Symposium on Advanced Database Systems, 2015

Knowledge Base Augmentation using Tabular Data.
Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW 2014), 2014

Web-Scale Extension of RDF Knowledge Bases from Templated Websites.
Proceedings of the Semantic Web - ISWC 2014, 2014

Extraction and Integration of Partially Overlapping Web Sources.
Proc. VLDB Endow., 2013

ALFRED: crowd assisted data extraction.
Proceedings of the 22nd International World Wide Web Conference, 2013

A framework for learning web wrappers from the crowd.
Proceedings of the 22nd International World Wide Web Conference, 2013

Wrapper Generation Supervised by a Noisy Crowd.
Proceedings of the First VLDB Workshop on Databases and Crowdsourcing, 2013

Minimizing the Costs of the Training Data for Learning Web Wrappers.
Proceedings of the Second International Workshop on Searching and Integrating New Web Data Sources, 2012

Web Data Reconciliation: Models and Experiences.
Proceedings of the Search Computing - Broadening Web Search, 2012

Automatic Evaluation of Relation Extraction Systems on Large-scale.
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, 2012

Flint: From Web Pages to Probabilistic Semantic Data.
Proceedings of the Semantic Search over the Web, 2012

Characterizing the uncertainty of web data: models and experiences.
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality, 2011

Automatically building probabilistic databases from the web.
Proceedings of the 20th International Conference on World Wide Web, 2011

Wrapper Generation for Overlapping Web Sources.
Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence, 2011

Contextual Data Extraction and Instance-Based Integration.
Proceedings of the First International Workshop on Searching and Integrating New Web Data Sources, 2011

Exploiting information redundancy to wring out structured data from the web.
Proceedings of the 19th International Conference on World Wide Web, 2010

Redundancy-Driven Web Data Extraction and Integration.
Proceedings of the 13th International Workshop on the Web and Databases 2010, 2010

Probabilistic Reconciliation of Records from Inaccurate Web Sources (Extended Abstract).
Proceedings of the Eighteenth Italian Symposium on Advanced Database Systems, 2010

Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources.
Proceedings of the Advanced Information Systems Engineering, 22nd International Conference, 2010

Data Extraction and Integration from Imprecise Web Sources.
Proceedings of the Seventeenth Italian Symposium on Advanced Database Systems, 2009

Structure and Semantics of Data-IntensiveWeb Pages: An Experimental Study on their Relationships.
J. Univers. Comput. Sci., 2008

Wrapper Inference for Ambiguous Web Pages.
Appl. Artif. Intell., 2008

Supporting the automatic construction of entity aware search engines.
Proceedings of the 10th ACM International Workshop on Web Information and Data Management (WIDM 2008), 2008

A New Generation Search Engine Supporting Cross Domain Queries.
Proceedings of the Sixteenth Italian Symposium on Advanced Database Systems, 2008

Searching Entities on the Web by Sample.
Proceedings of the Sixteenth Italian Symposium on Advanced Database Systems, 2008

Crawling programs for wrapper-based applications.
Proceedings of the IEEE International Conference on Information Reuse and Integration, 2008

NGS: a framework for multi-domain query answering.
Proceedings of the 24th International Conference on Data Engineering Workshops, 2008

Flint: Google-basing the Web.
Proceedings of the EDBT 2008, 2008

Efficient Techniques for Effective Wrapper Induction.
Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006

Clustering Web pages based on their structure.
Data Knowl. Eng., 2005

Efficiently Locating Collections of Web Pages to Wrap.
Proceedings of the WEBIST 2005, 2005

Speaking Words of WISDOM: Web Intelligent Search based on DOMain ontologies.
Proceedings of the SWAP 2005, 2005

Harvesting Structurally Similar Pages.
Proceedings of the Thirteenth Italian Symposium on Advanced Database Systems, 2005

An Automatic Data Grabber for Large Web Sites.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

Improving the expressiveness of ROADRUNNER.
Proceedings of the Twelfth Italian Symposium on Advanced Database Systems, 2004

Design and development of data-intensive web sites: The araneus approach.
ACM Trans. Internet Techn., 2003

Fine-grain web site structure discovery.
Proceedings of the Fifth ACM CIKM International Workshop on Web Information and Data Management (WIDM 2003), 2003

Automatic annotation of data extracted from large web sites.
Proceedings of the Eleventh Italian Symposium on Advanced Database Systems, 2003

Efficient Queries over Web Views.
IEEE Trans. Knowl. Data Eng., 2002

Managing Web-Based Data: Database Models and Transformations.
IEEE Internet Comput., 2002

RoadRunner: automatic data extraction from data-intensive web sites.
Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002

Back to Gold's Age: Bridging the Gap Between Traditional Grammar Inference and Web Information Extraction.
Proceedings of the Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati, 2002

Wrapping-oriented classification of web pages.
Proceedings of the 2002 ACM Symposium on Applied Computing (SAC), 2002

Data-Intensive Web Sites: Design and Maintenance.
World Wide Web, 2001

RoadRunner: Towards Automatic Data Extraction from Large Web Sites.
Proceedings of the VLDB 2001, 2001

The RoadRunner Web Data Extraction System.
Proceedings of the Nono Convegno Nazionale Sistemi Evoluti per Basi di Dati, 2001

Automatic Web Information Extraction in the ROADRUNNER System.
Proceedings of the ER 2001 Workshops, 2001

Web Site Evaluation: Methodology and Case Study.
Proceedings of the ER 2001 Workshops, 2001

Homer: a Model-Based CASE Tool for Data-Intensive Web Sites.
Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000

Experiences in XML data management.
Proceedings of the Ottavo Convegno Nazionale su Sistemi Evoluti per Basi di Dati, 2000

Araneus in the Era of XML.
IEEE Data Eng. Bull., 1999

The (Short) Araneus Guide to Web-Site Development.
Proceedings of the ACM SIGMOD Workshop on The Web and Databases, 1999

The ARANEUS Guide to Web-Site Development.
Proceedings of the Atti del Settimo Convegno Nazionale Sistemi Evoluti per Basi di Dati, 1999

Do we really need a new query language for XML?
Proceedings of the Query Languages Workshop, Boston, 1998

The Araneus Web-Base Management System.
Proceedings of the SIGMOD 1998, 1998

The Araneus Project: Extending Database Techniques to the World Wide Web.
Proceedings of the Atti del Sesto Convegno Nazionale Sistemi Evolluti per Basi di Dati, 1998

A Conceptual Representation of Clinical and Managerial Guidelines: The ATREUS Workflow Model.
Proceedings of the MEDINFO '98, 1998

Design and Maintenance of Data-Intensive Web Sites.
Proceedings of the Advances in Database Technology, 1998

Semistructured und Structured Data in the Web: Going Back and Forth.
SIGMOD Rec., 1997

To Weave the Web.
Proceedings of the VLDB'97, 1997

Structures in the Web.
Proceedings of the Convegno Nazionale Sistemi Evolluti per Basi di Dati, 1997

MrBrAQue: A Multimedia Medical Report Management System.
Proceedings of the International Conference on Multimedia Computing and Systems, 1997

ULIXES: Building Relational Views over the Web.
Proceedings of the Thirteenth International Conference on Data Engineering, 1997

ATREUS: A Model for the Conceptual Representation of a Workflow.
Proceedings of the Eighth International Workshop on Database and Expert Systems Applications, 1997

Automated Recognition of Cardiac Structures in Echocontrast Perfusion Studies.
Proceedings of the 10th IEEE Symposium on Computer-Based Medical Systems (CBMS '97), 1997

Reference Model for Medical Documentation: a Hypermedia Proposal.
Proceedings of the 1st IEEE Metadata Conference 1996, MD 1996, Silver Spring, 1996

Integration of Territorial Maps in the Vision System of an Autonomous Land Vehicle.
Proceedings of the Intelligent Autonomous Systems 2, 1989
