Mourad Ouzzani

Orcid: 0000-0002-4035-3025

Affiliations:
  • Qatar Computing Research Institute, HBKU, Qatar
  • Purdue University, West Lafayette, USA


According to our database1, Mourad Ouzzani authored at least 151 papers between 1994 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Exif2Vec: A Framework to Ascertain Untrustworthy Crowdsourced Images Using Metadata.
ACM Trans. Web, August, 2024

RetClean: Retrieval-Based Tabular Data Cleaning Using LLMs and Data Lakes.
Proc. VLDB Endow., August, 2024

Detecting and Mitigating Sampling Bias in Cybersecurity with Unlabeled Data.
Proceedings of the 33rd USENIX Security Symposium, 2024

2023
RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes.
CoRR, 2023

2022
Interactively discovering and ranking desired tuples by data exploration.
VLDB J., 2022

Automated Annotations for AI Data and Model Transparency.
ACM J. Data Inf. Qual., 2022

Sevi: Speech-to-Visualization through Neural Machine Translation.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

2021
Deep Learning for Blocking in Entity Matching: A Design Space Exploration.
Proc. VLDB Endow., 2021

RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation.
Proc. VLDB Endow., 2021

Horizon: Scalable Dependency-driven Data Cleaning.
Proc. VLDB Endow., 2021

Database systems research in the Arab world: a tradition that spans decades.
Commun. ACM, 2021

Ranking Desired Tuples by Database Exploration.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

2020
Debugging Large-Scale Data Science Pipelines using Dagger.
Proc. VLDB Endow., 2020

Pattern Functional Dependencies for Data Cleaning.
Proc. VLDB Endow., 2020

LocationSpark: In-memory Distributed Spatial Query Processing and Optimization.
Frontiers Big Data, 2020

Relational Pretrained Transformers towards Democratizing Data Preparation [Vision].
CoRR, 2020

CoClean: Collaborative Data Cleaning.
Proceedings of the 2020 International Conference on Management of Data, 2020

Data Curation with Deep Learning.
Proceedings of the 23rd International Conference on Extending Database Technology, 2020

Dagger: A Data (not code) Debugger.
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

2019
Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics.
Proc. VLDB Endow., 2019

Dataset-On-Demand: Automatic View Search and Presentation for Data Discovery.
CoRR, 2019

Technical Report: Optimizing Human Involvement for Entity Matching and Consolidation.
CoRR, 2019

Explaining Entity Resolution Predictions: Where are we and What needs to be done?
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2019

Towards an End-to-End Human-Centric Data Cleaning Framework.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2019

ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies.
Proceedings of the 2019 International Conference on Management of Data, 2019

Raha: A Configuration-Free Error Detection System.
Proceedings of the 2019 International Conference on Management of Data, 2019

EXPLAINER: Entity Resolution Explanations.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Unsupervised String Transformation Learning for Entity Consolidation.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Data civilizer: end-to-end support for data discovery, integration, and cleaning.
Proceedings of the Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker, 2019

2018
Correctness Criteria Beyond Serializability.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Generalization of ACID Properties.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Efficient Parallel Skyline Query Processing for High-Dimensional Data.
IEEE Trans. Knowl. Data Eng., 2018

Distributed Representations of Tuples for Entity Resolution.
Proc. VLDB Endow., 2018

RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -.
Proc. VLDB Endow., 2018

COACT: a query interface language for collaborative databases.
Distributed Parallel Databases, 2018

AUDIT: approving and tracking updates with dependencies in collaborative databases.
Distributed Parallel Databases, 2018

Reuse and Adaptation for Entity Resolution through Transfer Learning.
CoRR, 2018

Data Curation with Deep Learning [Vision]: Towards Self Driving Data Curation.
CoRR, 2018

FAHES: A Robust Disguised Missing Values Detector.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

FAHES: Detecting Disguised Missing Values.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Building Data Civilizer Pipelines with an Advanced Workflow Engine.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

2017
Fast and scalable inequality joins.
VLDB J., 2017

Errata for "Lightning Fast and Space Efficient Inequality Joins" (PVLDB 8(13): 2074-2085).
Proc. VLDB Endow., 2017

Lusail: A System for Querying Linked Data at Scale.
Proc. VLDB Endow., 2017

Pattern-Driven Data Cleaning.
CoRR, 2017

Human-Centric Data Cleaning [Vision].
CoRR, 2017

DeepER - Deep Entity Resolution.
CoRR, 2017

Entity Consolidation: The Golden Record Problem.
CoRR, 2017

A service computing manifesto: the next 10 years.
Commun. ACM, 2017

UGuide: User-Guided Discovery of FD-Detectable Errors.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

A Demonstration of Lusail: Querying Linked Data at Scale.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

A Demo of the Data Civilizer System.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

In-Memory Distributed Matrix Computation Processing and Optimization.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Query Optimizations over Decentralized RDF Graphs.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

The Data Civilizer System.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2016
Similarity Group-by Operators for Multi-Dimensional Relational Data.
IEEE Trans. Knowl. Data Eng., 2016

LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data.
Proc. VLDB Endow., 2016

Detecting Data Errors: Where are we and what needs to be done?
Proc. VLDB Endow., 2016

Learning to identify relevant studies for systematic reviews using random forest and external information.
Mach. Learn., 2016

The similarity-aware relational database set operators.
Inf. Syst., 2016

A large scale study of SVM based methods for abstract screening in systematic reviews.
CoRR, 2016

A Query-oriented Approach for Relevance in Citation Networks.
Proceedings of the 25th International Conference on World Wide Web, 2016


ORLF: A flexible framework for online record linkage and fusion.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

DataXFormer: A robust transformation discovery system.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Road to Freedom in Big Data Analytics.
Proceedings of the 19th International Conference on Extending Database Technology, 2016

2015
Lightning Fast and Space Efficient Inequality Joins.
Proc. VLDB Endow., 2015

KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing.
Proc. VLDB Endow., 2015

AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data.
Proc. VLDB Endow., 2015

A Demonstration of AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data.
Proc. VLDB Endow., 2015

Temporal Rules Discovery for Web Data Cleaning.
Proc. VLDB Endow., 2015

DataXFormer: An Interactive Data Transformation Tool.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

BigDansing: A System for Big Data Cleansing.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Query-time record linkage and fusion over Web databases.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

Approving Updates in Collaborative Databases.
Proceedings of the 2015 IEEE International Conference on Cloud Engineering, 2015

Spatial queries with k-nearest-neighbor and relational predicates.
Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2015

Efficient Processing of Hamming-Distance-Based Similarity-Search Queries Over MapReduce.
Proceedings of the 18th International Conference on Extending Database Technology, 2015

Cost Estimation of Spatial k-Nearest-Neighbor Operators.
Proceedings of the 18th International Conference on Extending Database Technology, 2015

Dataxformer: Leveraging the Web for Semantic Transformations.
Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research, 2015

2014
HandsOn DB: Managing Data Dependencies Involving Human Actions.
IEEE Trans. Knowl. Data Eng., 2014

GlobalHUB: A Model for Sustainable Online Communities.
Int. J. Web Portals, 2014

On Order-independent Semantics of the Similarity Group-By Relational Database Operator.
CoRR, 2014

iHUB: an information and collaborative management platform for life sciences.
Proceedings of the 23rd International World Wide Web Conference, 2014

The Similarity-Aware Relational Intersect Database Operator.
Proceedings of the Similarity Search and Applications - 7th International Conference, 2014

NADEEF/ER: generic and interactive entity resolution.
Proceedings of the International Conference on Management of Data, 2014

Descriptive and prescriptive data cleaning.
Proceedings of the International Conference on Management of Data, 2014

JISC: Adaptive Stream Processing Using Just-In-Time State Completion.
Proceedings of the 17th International Conference on Extending Database Technology, 2014

2013
NADEEF: A Generalized Data Cleaning System.
Proc. VLDB Endow., 2013

Introduction to the special issue on data quality.
Inf. Syst., 2013

NADEEF: a commodity data cleaning system.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion.
Proceedings of the 36th International ACM SIGIR conference on research and development in Information Retrieval, 2013

2012
The data analytics group at the qatar computing research institute.
SIGMOD Rec., 2012

Spatial Queries with Two kNN Predicates.
Proc. VLDB Endow., 2012

Data Quality Not Your Typical Database Problem.
Proceedings of the 4th International conference on Web and Information Technologies, 2012

M3: Stream Processing on Main-Memory MapReduce.
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

Lonomics Atlas: a tool to explore interconnected ionomic, genomic and environmental data.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011
ACConv - An Access Control Model for Conversational Web Services.
ACM Trans. Web, 2011

Guided data repair.
Proc. VLDB Endow., 2011

U-MAP: a system for usage-based schema matching and mapping.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2011

2010
WS-Query - A Framework to Efficiently Query Semantic Web Service.
Proceedings of the Emergent Web Intelligence: Advanced Information Retrieval, 2010

A Visual Analytics Approach to Understanding Spatiotemporal Hotspots.
IEEE Trans. Vis. Comput. Graph., 2010

A two-phase framework for quality-aware Web service selection.
Serv. Oriented Comput. Appl., 2010

Behavior Based Record Linkage.
Proc. VLDB Endow., 2010

Semantic Integration in Geosciences.
Int. J. Semantic Comput., 2010

GDR: a system for guided data repair.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

Preserving privacy and fairness in peer-to-peer data integration.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

Privometer: Privacy protection in social networks.
Proceedings of the Workshops Proceedings of the 26th International Conference on Data Engineering, 2010

Supporting real-world activities in database management systems.
Proceedings of the 26th International Conference on Data Engineering, 2010

2009
Correctness Criteria Beyond Serializability.
Proceedings of the Encyclopedia of Database Systems, 2009

Generalization of ACID Properties.
Proceedings of the Encyclopedia of Database Systems, 2009

Mass Informatics in Differential Proteomics.
Proceedings of the Encyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes), 2009

Syndromic surveillance: STL for modeling, visualizing, and monitoring disease counts.
BMC Medical Informatics Decis. Mak., 2009

Location-aware privacy and more: a systems approach using context-aware database management systems.
Proceedings of the 2nd SIGSPATIAL ACM GIS 2009 International Workshop on Security and Privacy in GIS and LBS, 2009

Supporting annotations on relations.
Proceedings of the EDBT 2009, 2009

2008
Community-Cyberinfrastructure-Enabled Discovery in Science and Engineering.
Comput. Sci. Eng., 2008

Data management challenges for computational transportation.
Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, 2008

Understanding syndromic hotspots - a visual analytics approach.
Proceedings of the 3rd IEEE Symposium on Visual Analytics Science and Technology, 2008

Managing Biological Data using BDBMS.
Proceedings of the 24th International Conference on Data Engineering, 2008

Usage-Based Schema Matching.
Proceedings of the 24th International Conference on Data Engineering, 2008

Verification of Access Control Requirements in Web Services Choreography.
Proceedings of the 2008 IEEE International Conference on Services Computing (SCC 2008), 2008

2007
Duplicate Elimination in Space-partitioning Tree Indexes.
Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007

LAHVA: Linked Animal-Human Health Visual Analytics.
Proceedings of the 2nd IEEE Symposium on Visual Analytics Science and Technology, 2007

bdbms - A Database Management System for Biological Data.
Proceedings of the Third Biennial Conference on Innovative Data Systems Research, 2007

2006
Access control enforcement for conversation-based web services.
Proceedings of the 15th international conference on World Wide Web, 2006

Discovering Consensus Patterns in Biological Databases.
Proceedings of the Data Mining and Bioinformatics, First International Workshop, 2006

Challenges in spatiotemporal stream query optimization.
Proceedings of the Fifth ACM International Workshop on Data Engineering for Wireless and Mobile Access, 2006

2005
The Indiana Center for Database Systems at Purdue University.
SIGMOD Rec., 2005

Data pre-processing in liquid chromatography-mass spectrometry-based proteomics.
Bioinform., 2005

2004
Internet Computing Support for Digital Government.
Proceedings of the Practical Handbook of Internet Computing., 2004

Efficient Access to Web Services.
IEEE Internet Comput., 2004

Query Processing and Optimization on the Web.
Distributed Parallel Databases, 2004

WebDG - A Platform for E-Government Web Services.
Proceedings of the Conceptual Modeling for Advanced Application Domains, 2004

Database Middleware for Distributed Ontologies in State and Federal Family & Social Services.
Proceedings of the 2004 Annual National Conference on Digital Government Research, 2004

2003
Infrastructure for E-Government Web Services.
IEEE Internet Comput., 2003

A Query Paradigm for Web Services.
Proceedings of the International Conference on Web Services, ICWS '03, June 23, 2003

Optimized Querying of E-Government Services.
Proceedings of the 2003 Annual National Conference on Digital Government Research, 2003

Semantic Web Enabled E-Government Services.
Proceedings of the 2003 Annual National Conference on Digital Government Research, 2003

Ubiquitous Access to Web Databases.
Proceedings of the Web-Powered Databases, 2003

2002
Supporting Data and Services Access in Digital Government Environments.
Proceedings of the Advances in Digital Government - Technology, Human Factors, and Policy, 2002

Preserving privacy in web services.
Proceedings of the Fourth ACM CIKM International Workshop on Web Information and Data Management (WIDM 2002), 2002

Privacy Preserving Composition of Government Web Services.
Proceedings of the 2002 Annual National Conference on Digital Government Research, 2002

2001
Managing Government Databases.
Computer, 2001

Ontology-based Support for Digital Government.
Proceedings of the VLDB 2001, 2001

2000
Supporting Dynamic Interactions among Web-Based Information Sources.
IEEE Trans. Knowl. Data Eng., 2000

Ontological Approach for Information Discovery in Internet Databases.
Distributed Parallel Databases, 2000

1999
Webfind: An Architecture and System for Querying Web Databases.
IEEE Internet Comput., 1999

World Wide Database - Integrating the Web, CORBA, and Databases.
Proceedings of the SIGMOD 1999, 1999

Using Java and CORBA for Implementing Internet Databases.
Proceedings of the 15th International Conference on Data Engineering, 1999

1994
A Top-Down Approach for Two Level Serializability.
Proceedings of the VLDB'94, 1994


  Loading...