Eugene Wu

Orcid: 0000-0003-4254-6688

Affiliations:
  • Columbia University, New York City, USA
  • Massachusetts Institute of Technology, Cambridge, MA, USA (PhD)


According to our database1, Eugene Wu authored at least 128 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines.
Proc. VLDB Endow., August, 2024

Data Cleaning Using Large Language Models.
CoRR, 2024

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing.
CoRR, 2024

Design-Specific Transformations in Visualization.
CoRR, 2024

Towards Accurate and Efficient Document Analytics with Large Language Models.
CoRR, 2024

SPADE: Synthesizing Assertions for Large Language Model Pipelines.
CoRR, 2024

Transform Table to Database Using Large Language Models.
Proceedings of Workshops at the 50th International Conference on Very Large Data Bases, 2024

Accelerating Deletion Interventions on OLAP Workload.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Relationalizing Tables with Large Language Models: The Promise and Challenges.
Proceedings of the 40th International Conference on Data Engineering, ICDE 2024, 2024

Cocoon: Semantic Table Profiling Using Large Language Models.
Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics, 2024

SET: Searching Effective Supervised Learning Augmentations in Large Tabular Data Repositories.
Proceedings of the Conference on Governance, 2024

The Fast and the Private: Task-based Dataset Search.
Proceedings of the 14th Conference on Innovative Data Systems Research, 2024

2023
Lightweight Materialization for Fast Dashboards Over Joins.
Proc. ACM Manag. Data, December, 2023

Pollock: A Data Loading Benchmark.
Proc. VLDB Endow., 2023

JoinBoost: Grow Trees Over Normalized Data Using Only SQL.
Proc. VLDB Endow., 2023

Saibot: A Differentially Private Data Search Platform.
Proc. VLDB Endow., 2023

OM3: An Ordered Multi-level Min-Max Representation for Interactive Progressive Visualization of Time Series.
Proc. ACM Manag. Data, 2023

Flood Event Extraction from News Media to Support Satellite-Based Flood Insurance.
CoRR, 2023

Data Ambiguity Strikes Back: How Documentation Improves GPT's Text-to-SQL.
CoRR, 2023

Kitana: Efficient Data Augmentation Search for AutoML.
CoRR, 2023

SmokedDuck Demonstration: SQLStepper.
Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

Anemoi: A Low-cost Sensorless Indoor Drone System for Automatic Mapping of 3D Airflow Fields.
Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 2023

Aggregation Consistency Errors in Semantic Layers and How to Avoid Them.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2023

DIG: The Data Interface Grammar.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2023

Teaching Data Science by Visualizing Data Table Transformations: Pandas Tutor for Python, Tidy Data Tutor for R, and SQL Tutor.
Proceedings of the 2nd International Workshop on Data Systems Education: Bridging education practice with education research, 2023

Random Forests over normalized data in CPU-GPU DBMSes.
Proceedings of the 19th International Workshop on Data Management on New Hardware, 2023

2022
Impact of Cognitive Biases on Progressive Visualization.
Dataset, May, 2022

DIEL: Interactive Visualization Beyond the Here and Now.
IEEE Trans. Vis. Comput. Graph., 2022

View Composition Algebra for Ad Hoc Comparison.
IEEE Trans. Vis. Comput. Graph., 2022

Impact of Cognitive Biases on Progressive Visualization.
IEEE Trans. Vis. Comput. Graph., 2022

ConnectorX: Accelerating Data Loading From Databases to Dataframes.
Proc. VLDB Endow., 2022

Calibration: A Simple Trick for Wide-table Delta Analytics.
CoRR, 2022

NL2INTERFACE: Interactive Visualization Interface Generation from Natural Language Queries.
CoRR, 2022

Extending the View Composition Algebra to Hierarchical Data.
CoRR, 2022

How Do Captions Affect Visualization Reading?
CoRR, 2022

A Grammar for Hypothesis-Driven Visual Analysis.
CoRR, 2022

Demonstration of PI2: Interactive Visualization Interface Generation for SQL Analysis in Notebook.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Reptile: Aggregation-level Explanations for Hierarchical Data.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Complaint-Driven Training Data Debugging at Interactive Speeds.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

PI2: End-to-end Interactive Visualization Interface Generation from Queries.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

A sensorless drone-based system for mapping indoor 3D airflow gradients: demo abstract.
Proceedings of the MobiSys '22: The 20th Annual International Conference on Mobile Systems, Applications and Services, Portland, Oregon, 27 June 2022, 2022

How I stopped worrying about training data bugs and started complaining.
Proceedings of the DEEM '22: Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning Philadelphia, 2022

2021
Explaining Inference Queries with Bayesian Optimization.
Proc. VLDB Endow., 2021

Enabling SQL-based Training Data Debugging for Federated Learning.
Proc. VLDB Endow., 2021

From Cleaning before ML to Cleaning for ML.
IEEE Data Eng. Bull., 2021

A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More.
CoRR, 2021

PI2: Generating Visual Analysis Interfaces From Queries.
CoRR, 2021

Automatic Y-axis Rescaling in Dynamic Visualizations.
Proceedings of the 2021 IEEE Visualization Conference, 2021

PopFactor: Live-Streamer Behavior and Popularity.
Proceedings of the Fifteenth International AAAI Conference on Web and Social Media, 2021

2020
ActiveDeeper: A Model-based Active Data Enrichment System.
Proc. VLDB Endow., 2020

Continuous Prefetch for Interactive Data Applications.
Proc. VLDB Endow., 2020

Monte Carlo Tree Search for Generating Interactive Data Analysis Interfaces.
CoRR, 2020

Facilitating Exploration with Interaction Snapshots under High Latency.
Proceedings of the 31st IEEE Visualization Conference, 2020

Complaint-driven Training Data Debugging for Query 2.0.
Proceedings of the 2020 International Conference on Management of Data, 2020

Physical Visualization Design.
Proceedings of the 2020 International Conference on Management of Data, 2020

2019
At a Glance: Pixel Approximate Entropy as a Measure of Line Chart Complexity.
IEEE Trans. Vis. Comput. Graph., 2019

Selective Wander Join: Fast Progressive Visualizations for Data Joins.
Informatics, 2019

Programming with Timespans in Interactive Visualizations.
CoRR, 2019

DIEL: Transparent Scaling for Interactive Visualization.
CoRR, 2019

AlphaClean: Automatic Generation of Data Cleaning Pipelines.
CoRR, 2019

Mining Precision Interfaces From Query Logs.
Proceedings of the 2019 International Conference on Management of Data, 2019

Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment.
Proceedings of the 2019 International Conference on Management of Data, 2019

Towards Democratizing Relational Data Visualization.
Proceedings of the 2019 International Conference on Management of Data, 2019

DeepBase: Deep Inspection of Neural Networks.
Proceedings of the 2019 International Conference on Management of Data, 2019

Acorn: Aggressive Result Caching in Distributed Data Processing Frameworks.
Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019

Crazy Idea! Databases ⨝ Reinforcement-learning Research (CIDR2).
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

Cross-platform Interactions and Popularity in the Live-streaming Community.
Proceedings of the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, 2019

2018
Smoke: Fine-grained Lineage at Interactive Speed.
Proc. VLDB Endow., 2018

Ten Years of WebTables.
Proc. VLDB Endow., 2018

Making Sense of Asynchrony in Interactive Data Visualizations.
CoRR, 2018

Precision Interfaces for Different Modalities.
Proceedings of the 2018 International Conference on Management of Data, 2018

Deeper: A Data Enrichment System Powered by Deep Web.
Proceedings of the 2018 International Conference on Management of Data, 2018

Provenance for Interactive Visualizations.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2018

Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications.
Proceedings of the 2018 International Conference on Management of Data, 2018

Leveraging Quality Prediction Models for Automatic Writing Feedback.
Proceedings of the Twelfth International Conference on Web and Social Media, 2018

2017
Mining Precision Interfaces From Query Logs.
CoRR, 2017

BoostClean: Automated Error Detection and Repair for Machine Learning.
CoRR, 2017

PreCog: Improving Crowdsourced Data Quality Before Acquisition.
CoRR, 2017

Dialectic: Enhancing Text Input Fields with Automatic Feedback to Improve Social Content Writing Quality.
CoRR, 2017

Precision Interfaces.
Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017

QFix: Diagnosing Errors through Query Histories.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

PALM: Machine Learning Explanations For Iterative Debugging.
Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017

Small Data.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Combining Design and Performance in a Data Visualization Management System.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

CIDR: Chat-oriented Innovations in Database Research.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2016
Skipping-oriented Partitioning for Columnar Layouts.
Proc. VLDB Endow., 2016

ActiveClean: Interactive Data Cleaning For Statistical Modeling.
Proc. VLDB Endow., 2016

Graphical Perception in Animated Bar Charts.
CoRR, 2016

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models.
CoRR, 2016

A DeVIL-ish approach to inconsistency in interactive visualizations.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016

QFix: Demonstrating Error Diagnosis in Query Histories.
Proceedings of the 2016 International Conference on Management of Data, 2016

Towards reliable interactive data cleaning: a user survey and recommendations.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016

ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning.
Proceedings of the 2016 International Conference on Management of Data, 2016

TrendQuery: a system for interactive exploration of trends.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016

PFunk-H: approximate query processing using perceptual models.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016

2015
Explaining data in visual analytic systems.
PhD thesis, 2015

CLAMShell: Speeding up Crowds for Low-latency Data Labeling.
Proc. VLDB Endow., 2015

Wisteria: Nurturing Scalable Data Cleaning Infrastructure.
Proc. VLDB Endow., 2015

Collaborative Data Analytics with DataHub.
Proc. VLDB Endow., 2015

SampleClean: Fast and Reliable Analytics on Dirty Data.
IEEE Data Eng. Bull., 2015

Automated Metadata Construction to Support Portable Building Applications.
Proceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments, 2015

Data Visualization Management Systems.
Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research, 2015

2014
The Case for Data Visualization Management Systems.
Proc. VLDB Endow., 2014

VERTEXICA: Your Relational Friend for Graph Analytics!
Proc. VLDB Endow., 2014

Indexing Cost Sensitive Prediction.
CoRR, 2014

2013
Scorpion: Explaining Away Outliers in Aggregate Queries.
Proc. VLDB Endow., 2013

SubZero: A fine-grained lineage system for scientific databases.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

Data in Context: Aiding News Consumers while Taming Dataspaces.
Proceedings of the First VLDB Workshop on Databases and Crowdsourcing, 2013

Mobile applications need targeted micro-updates.
Proceedings of the Asia-Pacific Workshop on Systems, 2013

2012
Sorting it All Out with Humans in the Loop.
Adv. Math. Commun., 2012

A Demonstration of DBWipes: Clean as You Query.
Proc. VLDB Endow., 2012

2011
Human-powered Sorts and Joins.
Proc. VLDB Endow., 2011

Demonstration of Qurk: a query processor for humanoperators.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2011

Partitioning techniques for fine-grained indexing.
Proceedings of the 27th International Conference on Data Engineering, 2011

No bits left behind.
Proceedings of the Fifth Biennial Conference on Innovative Data Systems Research, 2011

Crowdsourced Databases: Query Processing with People.
Proceedings of the Fifth Biennial Conference on Innovative Data Systems Research, 2011

Relational Cloud: a Database Service for the cloud.
Proceedings of the Fifth Biennial Conference on Innovative Data Systems Research, 2011

2010
TrajStore: An adaptive storage system for very large trajectory data sets.
Proceedings of the 26th International Conference on Data Engineering, 2010

2009
Demonstration of the TrajStore System.
Proc. VLDB Endow., 2009

The Case for RodentStore: An Adaptive, Declarative Storage System.
Proceedings of the Fourth Biennial Conference on Innovative Data Systems Research, 2009

2008
WebTables: exploring the power of tables on the web.
Proc. VLDB Endow., 2008

Uncovering the Relational Web.
Proceedings of the 11th International Workshop on the Web and Databases, 2008

2007
SASE: Complex Event Processing over Streams (Demo).
Proceedings of the Third Biennial Conference on Innovative Data Systems Research, 2007

2006
Probabilistic Data Management for Pervasive Computing: The <i>Data Furnace</i> Project.
IEEE Data Eng. Bull., 2006

SASE: Complex Event Processing over Streams
CoRR, 2006

High-performance complex event processing over streams.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2006

2005
Design Considerations for High Fan-In Systems: The HiFi Approach.
Proceedings of the Second Biennial Conference on Innovative Data Systems Research, 2005

2004
HiFi: A Unified Architecture for High Fan-in Systems.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004


  Loading...