Ihab F. Ilyas

Orcid: 0000-0001-9052-9714

Affiliations:
  • Apple Inc., USA
  • University of Waterloo, Canada


According to our database1, Ihab F. Ilyas authored at least 146 papers between 2001 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2020, "For contributions to data cleaning and data integration ".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Incremental IVF Index Maintenance for Streaming Vector Search.
CoRR, 2024

ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models.
CoRR, 2024

ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Construction of Paired Knowledge Graph - Text Datasets Informed by Cyclic Evaluation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
High-Throughput Vector Similarity Search in Knowledge Graphs.
Proc. ACM Manag. Data, 2023

Fact Ranking over Large-Scale Knowledge Graphs with Reasoning Embedding Models.
IEEE Data Eng. Bull., 2023

Open Domain Knowledge Extraction for Knowledge Graphs.
CoRR, 2023

Preface QDB.
Proceedings of the Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28, 2023

Growing and Serving Large Open-domain Knowledge Graphs.
Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

Real-Time LSM-Trees for HTAP Workloads.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Machine Learning and Data Cleaning: Which Serves the Other?
ACM J. Data Inf. Qual., 2022

Data Errors: Symptoms, Causes and Origins.
IEEE Data Eng. Bull., 2022

Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

2021
Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins.
Proc. VLDB Endow., 2021

Kamino: Constraint-Aware Differentially Private Data Synthesis.
Proc. VLDB Endow., 2021

PCOR: Private Contextual Outlier Release via Differentially Private Search.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Properties of Inconsistency Measures for Databases.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

2020
Approximate Denial Constraints.
Proc. VLDB Endow., 2020

Batchwise Probabilistic Incremental Data Cleaning.
CoRR, 2020

On sampling from data with duplicate records.
CoRR, 2020

Record fusion: A learning approach.
CoRR, 2020

Attention-based Learning for Missing Data Imputation in HoloClean.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

2019
Distributed Implementations of Dependency Discovery Algorithms.
Proc. VLDB Endow., 2019

Secure Multi-Party Functional Dependency Discovery.
Proc. VLDB Endow., 2019

Technical Report: Optimizing Human Involvement for Entity Matching and Consolidation.
CoRR, 2019

Principles of Progress Indicators for Database Repairing.
CoRR, 2019

Matching Entities Across Different Knowledge Graphs with Graph Embeddings.
CoRR, 2019

Distributed Dependency Discovery.
CoRR, 2019

Approximate Inference in Structured Instances with Noisy Categorical Observations.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

HoloDetect: Few-Shot Learning for Error Detection.
Proceedings of the 2019 International Conference on Management of Data, 2019

APEx: Accuracy-Aware Differentially Private Data Exploration.
Proceedings of the 2019 International Conference on Management of Data, 2019

A Formal Framework for Probabilistic Unclean Databases.
Proceedings of the 22nd International Conference on Database Theory, 2019

Distributed Discovery of Functional Dependencies.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

A Semi-Supervised Framework of Clustering Selection for De-Duplication.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Unsupervised String Transformation Learning for Entity Consolidation.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

ExplIQuE: Interactive Databases Exploration with SQL.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

Building Scalable Machine Learning Solutions for Data Cleaning.
Proceedings of the Datenbanksysteme für Business, 2019

Semi-supervised clustering for de-duplication.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Data unification at scale: data tamer.
Proceedings of the Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker, 2019

Data Cleaning
ACM Books 28, ACM, ISBN: 978-1-4503-7152-0, 2019

2018
Top-k Queries.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Rank-Join.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Rank-Aware Query Processing.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Data Integration: The Current Status and the Way Forward.
IEEE Data Eng. Bull., 2018

Building Data Civilizer Pipelines with an Advanced Workflow Engine.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

2017
Smart Meter Data Analytics: Systems, Algorithms, and Benchmarking.
ACM Trans. Database Syst., 2017

Data Quality: The Role of Empiricism.
SIGMOD Rec., 2017

HoloClean: Holistic Data Repairs with Probabilistic Inference.
Proc. VLDB Endow., 2017

Private Exploration Primitives for Data Cleaning.
CoRR, 2017

Entity Consolidation: The Golden Record Problem.
CoRR, 2017

A Demo of the Data Civilizer System.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

The Data Civilizer System.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2016
Distributed Data Deduplication.
Proc. VLDB Endow., 2016

Qualitative Data Cleaning.
Proc. VLDB Endow., 2016

Detecting Data Errors: Where are we and what needs to be done?
Proc. VLDB Endow., 2016

Learning to identify relevant studies for systematic reviews using random forest and external information.
Mach. Learn., 2016

Editorial: Special Issue on Web Data Quality.
ACM J. Data Inf. Qual., 2016

Effective Data Cleaning with Continuous Evaluation.
IEEE Data Eng. Bull., 2016

CLAMS: Bringing Quality to Data Lakes.
Proceedings of the 2016 International Conference on Management of Data, 2016

Data Cleaning: Overview and Emerging Challenges.
Proceedings of the 2016 International Conference on Management of Data, 2016

LONLIES: Estimating Property Values for Long Tail Entities.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Dark Data: Are we solving the right problems?
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

DataXFormer: A robust transformation discovery system.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

2015
KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing.
Proc. VLDB Endow., 2015

Trends in Cleaning Relational Data: Consistency and Deduplication.
Found. Trends Databases, 2015

DataXFormer: An Interactive Data Transformation Tool.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

BigDansing: A System for Big Data Cleansing.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

SMAS: A smart meter data analytics system.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

Benchmarking Smart Meter Data Analytics.
Proceedings of the 18th International Conference on Extending Database Technology, 2015

Dataxformer: Leveraging the Web for Semantic Transformations.
Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research, 2015

2014
Sampling from repairs of conditional functional dependency violations.
VLDB J., 2014

Top-k Nearest Neighbor Search In Uncertain Data Series.
Proc. VLDB Endow., 2014

NADEEF/ER: generic and interactive entity resolution.
Proceedings of the International Conference on Management of Data, 2014

Descriptive and prescriptive data cleaning.
Proceedings of the International Conference on Management of Data, 2014

RuleMiner: Data quality rules discovery.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

2013
Probabilistic Web Data Management.
World Wide Web, 2013

NADEEF: A Generalized Data Cleaning System.
Proc. VLDB Endow., 2013

Discovering Denial Constraints.
Proc. VLDB Endow., 2013

We are drowning in a sea of least publishable units (LPUs).
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

NADEEF: a commodity data cleaning system.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Holistic data cleaning: Putting violations into context.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

On the relative trust between inconsistent data and inaccurate constraints.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

Data Curation at Scale: The Data Tamer System.
Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research, 2013

2012
The data analytics group at the qatar computing research institute.
SIGMOD Rec., 2012

Just-in-time information extraction using extraction views.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Interpreting keyword queries over web knowledge bases.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011
Probabilistic Ranking Techniques in Relational Databases
Synthesis Lectures on Data Management, Morgan & Claypool Publishers, ISBN: 978-3-031-01846-6, 2011

Guided data repair.
Proc. VLDB Endow., 2011

Ranking with uncertain scoring functions: semantics and sensitivity measures.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2011

2010
Supporting ranking queries on uncertain and incomplete data.
VLDB J., 2010

Building Ranked Mashups of Unstructured Sources with Uncertain Information.
Proc. VLDB Endow., 2010

QUICK: Expressive and Flexible Search over Knowledge Bases and Text Collections.
Proc. VLDB Endow., 2010

Sampling the Repairs of Functional Dependency Violations under Hard Constraints.
Proc. VLDB Endow., 2010

Expressive and flexible access to web-extracted data: a keyword-based structured query language.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

Trends in Rank Join.
Proceedings of the Search Computing, 2010

Uncertainty in Rank Join.
Proceedings of the Search Computing, 2010

MashRank: Towards uncertainty-aware and rank-aware mashups.
Proceedings of the 26th International Conference on Data Engineering, 2010

ProbClean: A probabilistic duplicate detection system.
Proceedings of the 26th International Conference on Data Engineering, 2010

2009
Discovering and Exploiting Statistical Properties for Query Optimization in Relational Databases: A Survey.
Stat. Anal. Data Min., 2009

Creating Competitive Products.
Proc. VLDB Endow., 2009

StatAdvisor: Recommending Statistical Views.
Proc. VLDB Endow., 2009

Modeling and Querying Possible Repairs in Duplicate Detection.
Proc. VLDB Endow., 2009

Guest editorial: special issue on ranking in databases.
Distributed Parallel Databases, 2009

Rank-Join Algorithms for Search Computing.
Proceedings of the Search Computing: Challenges and Directions [outcome of the first SeCO Workshop on Search Computing Challenges and Directions, 2009

PSALM: Cardinality Estimation inthe Presence of Fine-Grained Access Controls.
Proceedings of the 25th International Conference on Data Engineering, 2009

Ranking with Uncertain Scores.
Proceedings of the 25th International Conference on Data Engineering, 2009

2008
Probabilistic top-<i>k</i> and ranking-aggregate queries.
ACM Trans. Database Syst., 2008

Efficient search for the top-k probable nearest neighbors in uncertain databases.
Proc. VLDB Endow., 2008

A survey of top-<i>k</i> query processing techniques in relational database systems.
ACM Comput. Surv., 2008

Message from the DBRANK'08 program co-chairs.
Proceedings of the 24th International Conference on Data Engineering Workshops, 2008

08421 Working Group: Classification, Representation and Modeling.
Proceedings of the Uncertainty Management in Information Systems, 12.10. - 17.10.2008, 2008

08421 Working Group: Lineage/Provenance.
Proceedings of the Uncertainty Management in Information Systems, 12.10. - 17.10.2008, 2008

2007
Report on the First International Workshop on Ranking in Databases (DBRank'07).
SIGMOD Rec., 2007

URank: formulation and efficient evaluation of top-k queries in uncertain databases.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2007

Finding Skyline and Top-k Bargaining Solutions.
Proceedings of the 23rd International Conference on Data Engineering, 2007

Top-k Query Processing in Uncertain Databases.
Proceedings of the 23rd International Conference on Data Engineering, 2007

Collecting and Maintaining Just-in-Time Statistics.
Proceedings of the 23rd International Conference on Data Engineering, 2007

2006
Adaptive rank-aware query optimization in relational databases.
ACM Trans. Database Syst., 2006

FIX: Feature-based Indexing Technique for XML Documents.
Proceedings of the 32nd International Conference on Very Large Data Bases, 2006

InterJoin: Exploiting Indexes and Materialized Views in XPath Evaluation.
Proceedings of the 18th International Conference on Scientific and Statistical Database Management, 2006

Supporting ad-hoc ranking aggregates.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2006

XSEED: Accurate and Fast Cardinality Estimation for XPath Queries.
Proceedings of the 22nd International Conference on Data Engineering, 2006

2005
RankSQL: Supporting Ranking Queries in Relational Database Management Systems.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

RankSQL: Query Algebra and Optimization for Relational Top-k Queries.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005

Rank-Aware Query Processing and Optimization.
Proceedings of the 21st International Conference on Data Engineering, 2005

2004
Rank-aware query processing and optimization
PhD thesis, 2004

Supporting top-k join queries in relational databases.
VLDB J., 2004

Reminiscences on Influential Papers.
SIGMOD Rec., 2004

VDBMS: A testbed facility for research in video database benchmarking.
Multim. Syst., 2004

CORDS: Automatic Generation of Correlation Statistics in DB2.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

Rank-aware Query Optimization.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004

CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004

Nile: A Query Processing Engine for Data Streams.
Proceedings of the 20th International Conference on Data Engineering, 2004

Automatic Relationship Discovery in Self-Managing Database Systems.
Proceedings of the 1st International Conference on Autonomic Computing (ICAC 2004), 2004

2003
Estimating Compilation Time of a Query Optimizer.
Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, 2003

Video query processing in the VDBMS testbed for video database research.
Proceedings of the First ACM International Workshop on Multimedia Databases, 2003

2002
Joining Ranked Inputs in Practice.
Proceedings of 28th International Conference on Very Large Data Bases, 2002

A Video Database Management System for Advancing Video Database Research.
Proceedings of the MIS 2002, International Workshop on Multimedia Information Systems, October 10, 2002

A Distributed Database Server for Continuous Media.
Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26, 2002

2001
SP-GiST: An Extensible Database Index for Supporting Space Partitioning Trees.
J. Intell. Inf. Syst., 2001

An Extensible Index for Spatial Databases.
Proceedings of the 13th International Conference on Scientific and Statistical Database Management, 2001


  Loading...