Nan Tang

Orcid: 0000-0003-2832-0295

Affiliations:
  • Hong Kong University of Science and Technology Guangzhou (HKUST-GZ), Information Hub, Guangzhou, China
  • Hamad Bin Khalifa University, Qatar Computing Research Institute, Doha, Qatar (former)
  • University of Edinburgh, UK (former)
  • Centrum Wiskunde & Informatica, Amsterdam, The Netherlands (former)
  • Chinese University of Hong Kong, Hong Kong (former, PhD 2007)


According to our database1, Nan Tang authored at least 158 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
RetClean: Retrieval-Based Tabular Data Cleaning Using LLMs and Data Lakes.
Proc. VLDB Endow., August, 2024

LakeCompass: An End-to-End System for Table Maintenance, Search and Analysis in Data Lakes.
Proc. VLDB Endow., August, 2024

HAIChart: Human and AI Paired Visualization System.
Proc. VLDB Endow., July, 2024

Are Large Language Models a Good Replacement of Taxonomies?
Proc. VLDB Endow., July, 2024

The Dawn of Natural Language to SQL: Are We Fully Ready? [Experiment, Analysis \u0026 Benchmark ].
Proc. VLDB Endow., July, 2024

Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL.
Proc. VLDB Endow., July, 2024

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes.
Proc. VLDB Endow., April, 2024

Unicorn: A Unified Multi-Tasking Matching Model.
SIGMOD Rec., March, 2024

MisDetect: Iterative Mislabel Detection using Early Loss.
Proc. VLDB Endow., February, 2024

Controllable Tabular Data Synthesis Using Diffusion Models.
Proc. ACM Manag. Data, February, 2024

Tabular data synthesis with generative adversarial networks: design space and optimizations.
VLDB J., 2024

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
CoRR, 2024

Are Large Language Models a Good Replacement of Taxonomies?
CoRR, 2024

Are Large Language Models Good Statisticians?
CoRR, 2024

CRAG - Comprehensive RAG Benchmark.
CoRR, 2024

The Dawn of Natural Language to SQL: Are We Fully Ready?
CoRR, 2024

IDE: A System for Iterative Mislabel Detection.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

ChatPipe: Orchestrating Data Preparation Pipelines by Optimizing Human-ChatGPT Interactions.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Mitigating Data Scarcity in Supervised Machine Learning Through Reinforcement Learning Guided Data Generation.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

MAR: Matching-Augmented Reasoning for Enhancing Visual-based Entity Question Answering.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

VerifAI: Verified Generative AI.
Proceedings of the 14th Conference on Innovative Data Systems Research, 2024

2023
HOFD: An Outdated Fact Detector for Knowledge Bases.
IEEE Trans. Knowl. Data Eng., October, 2023

Road-Aware Indexing for Trajectory Range Queries.
IEEE Trans. Knowl. Data Eng., August, 2023

Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration.
Proc. ACM Manag. Data, 2023

Learned Data-aware Image Representations of Line Charts for Similarity Search.
Proc. ACM Manag. Data, 2023

Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning.
Proc. ACM Manag. Data, 2023

HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation.
Proc. ACM Manag. Data, 2023

GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data.
Proc. ACM Manag. Data, 2023

SEED: Simple, Efficient, and Effective Data Management via Large Language Models.
CoRR, 2023

VerifAI: Verified Generative AI.
CoRR, 2023

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation.
CoRR, 2023

ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions.
CoRR, 2023

RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes.
CoRR, 2023

Pay "Attention" to Chart Images for What You Read on Text.
Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

Demystifying Artificial Intelligence for Data Preparation.
Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

Efficient Coreset Selection with Cluster-based Methods.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes.
Proceedings of the 13th Conference on Innovative Data Systems Research, 2023

2022
Interactively discovering and ranking desired tuples by data exploration.
VLDB J., 2022

Natural Language to Visualization by Neural Machine Translation.
IEEE Trans. Vis. Comput. Graph., 2022

Steerable Self-Driving Data Visualization.
IEEE Trans. Knowl. Data Eng., 2022

Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning.
Proc. VLDB Endow., 2022

DADER: Hands-Off Entity Resolution with Domain Adaptation.
Proc. VLDB Endow., 2022

Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks.
Proc. VLDB Endow., 2022

Selective Data Acquisition in the Wild for Model Charging.
Proc. VLDB Endow., 2022

Preface.
J. Comput. Sci. Technol., 2022

AlphaQO: Robust Learned Query Optimizer.
Int. J. Softw. Informatics, 2022

Domain Adaptation for Deep Entity Resolution.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Synthesizing Privacy Preserving Entity Resolution Datasets.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Feature Augmentation with Reinforcement Learning.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Mis-categorized entities detection.
VLDB J., 2021

Deep Learning for Blocking in Entity Matching: A Design Space Exploration.
Proc. VLDB Endow., 2021

RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation.
Proc. VLDB Endow., 2021

Learned Cardinality Estimation: A Design Space Exploration and A Comparative Evaluation.
Proc. VLDB Endow., 2021

Automatic Data Acquisition for Deep Learning.
Proc. VLDB Endow., 2021

Adaptive Data Augmentation for Supervised Learning over Missing Data.
Proc. VLDB Endow., 2021

Learned Cardinality Estimation for Similarity Queries.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Ranking Desired Tuples by Database Exploration.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

2020
Making data visualization more efficient and effective: a survey.
VLDB J., 2020

Debugging Large-Scale Data Science Pipelines using Dagger.
Proc. VLDB Endow., 2020

Pattern Functional Dependencies for Data Cleaning.
Proc. VLDB Endow., 2020

DeepTrack: Monitoring and Exploring Spatio-Temporal Data - A Case of Tracking COVID-19 -.
Proc. VLDB Endow., 2020

VisClean: Interactive Cleaning for Progressive Visualization.
Proc. VLDB Endow., 2020

Deductive optimization of relational data storage.
Proc. ACM Program. Lang., 2020

DeepEye: A Data Science System for Monitoring and Exploring COVID-19 Data.
IEEE Data Eng. Bull., 2020

Relational Pretrained Transformers towards Democratizing Data Preparation [Vision].
CoRR, 2020

Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries.
Proceedings of the 2020 International Conference on Management of Data, 2020

CoClean: Collaborative Data Cleaning.
Proceedings of the 2020 International Conference on Management of Data, 2020

Reinforcement Learning with Tree-LSTM for Join Order Selection.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Interactive Cleaning for Progressive Visualization through Composite Questions.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Outdated Fact Detection in Knowledge Bases.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Data Curation with Deep Learning.
Proceedings of the 23rd International Conference on Extending Database Technology, 2020

Dagger: A Data (not code) Debugger.
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

2019
Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries.
ACM Trans. Database Syst., 2019

Querying Shortest Paths on Time Dependent Road Networks.
Proc. VLDB Endow., 2019

Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics.
Proc. VLDB Endow., 2019

Dataset-On-Demand: Automatic View Search and Presentation for Data Discovery.
CoRR, 2019

Technical Report: Optimizing Human Involvement for Entity Matching and Consolidation.
CoRR, 2019

Explaining Entity Resolution Predictions: Where are we and What needs to be done?
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2019

Towards Democratizing Relational Data Visualization.
Proceedings of the 2019 International Conference on Management of Data, 2019

ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies.
Proceedings of the 2019 International Conference on Management of Data, 2019

Raha: A Configuration-Free Error Detection System.
Proceedings of the 2019 International Conference on Management of Data, 2019

Unsupervised String Transformation Learning for Entity Consolidation.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Data civilizer: end-to-end support for data discovery, integration, and cleaning.
Proceedings of the Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker, 2019

2018
Distilling relations using knowledge bases.
VLDB J., 2018

Distributed Representations of Tuples for Entity Resolution.
Proc. VLDB Endow., 2018

RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -.
Proc. VLDB Endow., 2018

Reuse and Adaptation for Entity Resolution through Transfer Learning.
CoRR, 2018

Data Curation with Deep Learning [Vision]: Towards Self Driving Data Curation.
CoRR, 2018

DeepEye: An automatic big data visualization framework.
Big Data Min. Anal., 2018

DeepEye: Creating Good Data Visualizations by Keyword Search.
Proceedings of the 2018 International Conference on Management of Data, 2018

FAHES: A Robust Disguised Missing Values Detector.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

FAHES: Detecting Disguised Missing Values.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Building Data Civilizer Pipelines with an Advanced Workflow Engine.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

DeepEye: Towards Automatic Data Visualization.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Cleaning Your Wrong Google Scholar Entries.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Discovering Mis-Categorized Entities.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

DeepEye: Visualizing Your Data by Keyword Search.
Proceedings of the 21st International Conference on Extending Database Technology, 2018

2017
Fast and scalable inequality joins.
VLDB J., 2017

A Novel Cost-Based Model for Data Repairing.
IEEE Trans. Knowl. Data Eng., 2017

Synthesizing Entity Matching Rules by Examples.
Proc. VLDB Endow., 2017

Errata for "Lightning Fast and Space Efficient Inequality Joins" (PVLDB 8(13): 2074-2085).
Proc. VLDB Endow., 2017

Dependable Data Repairing with Fixing Rules.
ACM J. Data Inf. Qual., 2017

DeepER - Deep Entity Resolution.
CoRR, 2017

Entity Consolidation: The Golden Record Problem.
CoRR, 2017

UGuide: User-Guided Discovery of FD-Detectable Errors.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

A Demo of the Data Civilizer System.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Generating Concise Entity Matching Rules.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Interactive Data Repairing: the FALCON Dive.
Proceedings of the 25th Italian Symposium on Advanced Database Systems, 2017

Cleaning Relations Using Knowledge Bases.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

The Data Civilizer System.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2016
Detecting Data Errors: Where are we and what needs to be done?
Proc. VLDB Endow., 2016

Interactive and Deterministic Data Cleaning.
Proceedings of the 2016 International Conference on Management of Data, 2016


Graph Stream Summarization: From Big Bang to Big Crunch.
Proceedings of the 2016 International Conference on Management of Data, 2016

Road to Freedom in Big Data Analytics.
Proceedings of the 19th International Conference on Extending Database Technology, 2016

2015
Lightning Fast and Space Efficient Inequality Joins.
Proc. VLDB Endow., 2015

KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing.
Proc. VLDB Endow., 2015

On Summarizing Graph Streams.
CoRR, 2015

BigDansing: A System for Big Data Cleansing.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Big RDF data cleaning.
Proceedings of the 31st IEEE International Conference on Data Engineering Workshops, 2015

Proof positive and negative in data cleaning.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

2014
Incremental Detection of Inconsistencies in Distributed Data.
IEEE Trans. Knowl. Data Eng., 2014

Interaction between Record Matching and Data Repairing.
ACM J. Data Inf. Qual., 2014

Conflict resolution with data currency and consistency.
ACM J. Data Inf. Qual., 2014

Towards dependable data repairing with fixing rules.
Proceedings of the International Conference on Management of Data, 2014

NADEEF/ER: generic and interactive entity resolution.
Proceedings of the International Conference on Management of Data, 2014

Big Data Cleaning.
Proceedings of the Web Technologies and Applications - 16th Asia-Pacific Web Conference, 2014

2013
NADEEF: A Generalized Data Cleaning System.
Proc. VLDB Endow., 2013

NADEEF: a commodity data cleaning system.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Inferring data currency and consistency for conflict resolution.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

Data Quality Problems beyond Consistency and Deduplication.
Proceedings of the In Search of Elegance in the Theory and Practice of Computation, 2013

2012
The data analytics group at the qatar computing research institute.
SIGMOD Rec., 2012

Adding regular expressions to graph reachability and pattern queries.
Frontiers Comput. Sci., 2012

2011
CerFix: A System for Cleaning Data with Certain Fixes.
Proc. VLDB Endow., 2011

Interaction between record matching and data repairing.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2011

2010
Projective Distribution of XQuery with Updates.
IEEE Trans. Knowl. Data Eng., 2010

Towards Certain Fixes with Editing Rules and Master Data.
Proc. VLDB Endow., 2010

Graph Pattern Matching: From Intractable to Polynomial Time.
Proc. VLDB Endow., 2010

2009
Efficient Distribution of Full-Fledged XQuery.
Proceedings of the 25th International Conference on Data Engineering, 2009

Materialized View Selection in XML Databases.
Proceedings of the Database Systems for Advanced Applications, 2009

Space-economical partial gram indices for exact substring matching.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009

2008
Fast XML Structural Join Algorithms by Partitioning.
J. Res. Pract. Inf. Technol., 2008

Hierarchical Indexing Approach to Support XPath Queries.
Proceedings of the 24th International Conference on Data Engineering, 2008

Multiple Materialized View Selection for XPath Query Rewriting.
Proceedings of the 24th International Conference on Data Engineering, 2008

2007
Efficient Xpath query processing in native XML databases.
PhD thesis, 2007

2006
Answering XML Queries Using Path-Based Indexes: A Survey.
World Wide Web, 2006

Fast Structural Join with a Location Function.
Proceedings of the Database Systems for Advanced Applications, 2006

Fast Reachability Query Processing.
Proceedings of the Database Systems for Advanced Applications, 2006

2005
WIN: An Effcient Data Placement Strategy for Parallel XML Databases.
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

Accelerating XML Structural Join by Partitioning.
Proceedings of the Database and Expert Systems Applications, 16th International Conference, 2005

2004
Answering XML Twig Queries with Automata.
Proceedings of the Advanced Web Technologies and Applications, 2004

2003
Data Placement and Query Processing Based on RPE Parallelisms.
Proceedings of the 27th International Computer Software and Applications Conference (COMPSAC 2003): Design and Assessment of Trustworthy Software-Based Systems, 2003


  Loading...