Jiannan Wang

Affiliations:
  • Simon Fraser University


According to our database1, Jiannan Wang authored at least 63 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
CleanAgent: Automating Data Standardization with LLM-based Agents.
CoRR, 2024

FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data.
Proceedings of the Proceedings 27th International Conference on Extending Database Technology, 2024

2023
Web Connector: A Unified API Wrapper to Simplify Web Data Collection.
Proc. VLDB Endow., 2023

2022
ConnectorX: Accelerating Data Loading From Databases to Dataframes.
Proc. VLDB Endow., 2022

User Interfaces for Exploratory Data Analysis: A Survey of Open-Source and Commercial Tools.
IEEE Data Eng. Bull., 2022

One Size Does Not Fit All: A Bandit-Based Sampler Combination Framework with Theoretical Guarantees.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Complaint-Driven Training Data Debugging at Interactive Speeds.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

How I stopped worrying about training data bugs and started complaining.
Proceedings of the DEEM '22: Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning Philadelphia, 2022

2021
Are We Ready For Learned Cardinality Estimation?
Proc. VLDB Endow., 2021

Explaining Inference Queries with Bayesian Optimization.
Proc. VLDB Endow., 2021

Enabling SQL-based Training Data Debugging for Federated Learning.
Proc. VLDB Endow., 2021

DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Automating Entity Matching Model Development.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

2020
ActiveDeeper: A Model-based Active Data Enrichment System.
Proc. VLDB Endow., 2020

SCODED: Statistical Constraint Oriented Data Error Detection.
Proceedings of the 2020 International Conference on Management of Data, 2020

Complaint-driven Training Data Debugging for Query 2.0.
Proceedings of the 2020 International Conference on Management of Data, 2020

Towards Extracting Highlights From Recorded Live Videos: An Implicit Crowdsourcing Approach.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

2019
Detecting Data Errors with Statistical Constraints.
CoRR, 2019

Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment.
Proceedings of the 2019 International Conference on Management of Data, 2019

Crowdsourcing Database Systems: Overview and Challenges.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

2018
Cleaning Crowdsourced Labels Using Oracles For Statistical Classification.
Proc. VLDB Endow., 2018

Crowd-Powered Data Mining.
CoRR, 2018

Deeper: A Data Enrichment System Powered by Deep Web.
Proceedings of the 2018 International Conference on Management of Data, 2018

AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics.
Proceedings of the 2018 International Conference on Management of Data, 2018

2017
Dependable Data Repairing with Fixing Rules.
ACM J. Data Inf. Qual., 2017

PreCog: Improving Crowdsourced Data Quality Before Acquisition.
CoRR, 2017

Preference-driven similarity join.
Proceedings of the International Conference on Web Intelligence, 2017

Crowdsourced Data Management: Overview and Challenges.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Schemaless Join for Result Set Preferences.
Proceedings of the 2017 IEEE International Conference on Information Reuse and Integration, 2017

2016
Crowdsourced Data Management: A Survey.
IEEE Trans. Knowl. Data Eng., 2016

Skipping-oriented Partitioning for Columnar Layouts.
Proc. VLDB Endow., 2016

ActiveClean: Interactive Data Cleaning For Statistical Modeling.
Proc. VLDB Endow., 2016

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models.
CoRR, 2016

PrivateClean: Data Cleaning and Differential Privacy.
Proceedings of the 2016 International Conference on Management of Data, 2016

ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning.
Proceedings of the 2016 International Conference on Management of Data, 2016

Data Cleaning: Overview and Emerging Challenges.
Proceedings of the 2016 International Conference on Management of Data, 2016

Finding Gangs in War from Signed Networks.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

2015
Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views.
Proc. VLDB Endow., 2015

CLAMShell: Speeding up Crowds for Low-latency Data Labeling.
Proc. VLDB Endow., 2015

Wisteria: Nurturing Scalable Data Cleaning Infrastructure.
Proc. VLDB Endow., 2015

SampleClean: Fast and Reliable Analytics on Dirty Data.
IEEE Data Eng. Bull., 2015

QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

2014
Extending string similarity join to tolerant fuzzy token matching.
ACM Trans. Database Syst., 2014

The Expected Optimal Labeling Order Problem for Crowdsourced Joins and Entity Resolution.
CoRR, 2014

Towards dependable data repairing with fixing rules.
Proceedings of the International Conference on Management of Data, 2014

A sample-and-clean framework for fast and accurate query processing on dirty data.
Proceedings of the International Conference on Management of Data, 2014

MassJoin: A mapreduce-based method for scalable string similarity joins.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

Learning accurate kinematic control of cable-driven surgical robots using data cleaning and Gaussian Process Regression.
Proceedings of the 2014 IEEE International Conference on Automation Science and Engineering, 2014

2013
Leveraging transitive relations for crowdsourced joins.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints.
Proceedings of the Joint 2013 EDBT/ICDT Conferences, 2013

2012
Trie-join: a trie-based method for efficient string similarity joins.
VLDB J., 2012

CrowdER: Crowdsourcing Entity Resolution.
Proc. VLDB Endow., 2012

Can we beat the prefix filtering?: an adaptive framework for similarity join and search.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Supporting efficient top-k queries in type-ahead search.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

2011
Entity Matching: How Similar Is Similar.
Proc. VLDB Endow., 2011

PASS-JOIN: A Partition-based Method for Similarity Joins.
Proc. VLDB Endow., 2011

Fast-join: An efficient method for fuzzy token matching based string similarity join.
Proceedings of the 27th International Conference on Data Engineering, 2011

DBease: Making Databases User-Friendly and Easily Accessible.
Proceedings of the Fifth Biennial Conference on Innovative Data Systems Research, 2011

2010
Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints.
Proc. VLDB Endow., 2010

Interactive and fuzzy search: a dynamic way to explore MEDLINE.
Bioinform., 2010

Efficient fuzzy type-ahead search in TASTIER.
Proceedings of the 26th International Conference on Data Engineering, 2010

2009
Automatic URL completion and prediction using fuzzy type-ahead search.
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009


  Loading...