Wentao Wu

Orcid: 0009-0006-2454-7109

Affiliations:
  • Microsoft Research, Redmond, WA, USA
  • University of Wisconsin-Madison, USA (former)


According to our database1, Wentao Wu authored at least 69 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems.
VLDB J., September, 2024

Budget-aware Query Tuning: An AutoML Perspective.
SIGMOD Rec., September, 2024

How good are machine learning clouds? Benchmarking two snapshots over 5 years.
VLDB J., May, 2024

Wred: Workload Reduction for Scalable Index Tuning.
Proc. ACM Manag. Data, February, 2024

A systematic evaluation of machine learning on serverless infrastructure.
VLDB J., 2024

Wii: Dynamic Budget Reallocation In Index Tuning.
Proc. ACM Manag. Data, 2024

TablePuppet: A Generic Framework for Relational Federated Learning.
CoRR, 2024

Data Debugging with Shapley Importance over Machine Learning Pipelines.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
ML-Powered Index Tuning: An Overview of Recent Progress and Open Challenges.
SIGMOD Rec., December, 2023

Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

2022
Data Science Through the Looking Glass: Analysis of Millions of GitHub Notebooks and ML.NET Pipelines.
SIGMOD Rec., 2022

DISTILL: Low-Overhead Data-Driven Techniques for Filtering and Costing Indexes for Scalable Index Tuning.
Proc. VLDB Endow., 2022

Stochastic Gradient Descent without Full Data Shuffle.
CoRR, 2022

Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines.
CoRR, 2022

In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Budget-aware Index Tuning with Reinforcement Learning.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Factor Windows: Cost-based Query Rewriting for Optimizing Correlated Window Aggregates.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

2021
Model averaging in distributed machine learning: a case study with Apache Spark.
VLDB J., 2021

Hyperspace: The Indexing Subsystem of Azure Synapse.
Proc. VLDB Endow., 2021

VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition.
Proc. VLDB Endow., 2021

Optimization of Threshold Functions over Streams.
Proc. VLDB Endow., 2021

A Data Quality-Driven View of MLOps.
IEEE Data Eng. Bull., 2021

Towards Demystifying Serverless Machine Learning Training.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Towards understanding end-to-end learning in the context of data: machine learning dancing over semirings & Codd's table.
Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning, 2021

OpenBox: A Generalized Black-box Optimization Service.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Ease.ML: A Lifecycle Management System for Machine Learning.
Proceedings of the 11th Conference on Innovative Data Systems Research, 2021

Magpie: Python at Speed and Scale using Cloud Backends.
Proceedings of the 11th Conference on Innovative Data Systems Research, 2021

2020
Ease.ml/snoopy in Action: Towards Automatic Feasibility Analysis for Machine Learning Application Development.
Proc. VLDB Endow., 2020

Helios: Hyperscale Indexing for the Cloud & Edge.
Proc. VLDB Endow., 2020

Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions.
Proc. VLDB Endow., 2020

On Automatic Feasibility Study for Machine Learning Application Development with ease.ml/snoopy.
CoRR, 2020

Cost-based Query Rewriting Techniques for Optimizing Aggregates Over Correlated Windows.
CoRR, 2020

A Note On Operator-Level Query Execution Cost Modeling.
CoRR, 2020

Building Continuous Integration Services for Machine Learning.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

C olumnSGD: A Column-oriented Framework for Distributed Stochastic Gradient Descent.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

2019
Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization.
Proc. VLDB Endow., 2019

Data Science through the looking glass and what we found there.
CoRR, 2019

Quantitative Overfitting Management for Human-in-the-loop ML Application Development with ease.ml/meter.
CoRR, 2019

AI Meets AI: Leveraging Query Executions to Improve Index Recommendations.
Proceedings of the 2019 International Conference on Management of Data, 2019

Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

MLlib*: Fast Training of GLMs Using Spark MLlib.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Serverless Event-Stream Processing over Virtual Actors.
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

2018
MLBench: Benchmarking Machine Learning Services Against Human Experts.
Proc. VLDB Endow., 2018

Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads.
Proc. VLDB Endow., 2018

Ease.ml in Action: Towards Multi-tenant Declarative Learning Services.
Proc. VLDB Endow., 2018

Plan Stitch: Harnessing the Best of Many Plans.
Proc. VLDB Endow., 2018

2017
Semantic Bootstrapping: A Theoretical Perspective.
IEEE Trans. Knowl. Data Eng., 2017

MLog: Towards Declarative In-Database Machine Learning.
Proc. VLDB Endow., 2017

How Good Are Machine Learning Clouds for Binary Classification with Good Features?
CoRR, 2017

An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision.
Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017

Towards Interactive Debugging of Rule-based Entity Matching.
Proceedings of the 20th International Conference on Extending Database Technology, 2017

How good are machine learning clouds for binary classification with good features?: extended abstract.
Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

2016
Sampling-Based Query Re-Optimization.
Proceedings of the 2016 International Conference on Management of Data, 2016

2015
Revisiting Differentially Private Regression: Lessons From Learning Theory and their Consequences.
CoRR, 2015

On Debugging Non-Answers in Keyword Search Systems.
Proceedings of the 18th International Conference on Extending Database Technology, 2015

2014
Uncertainty Aware Query Execution Time Prediction.
Proc. VLDB Endow., 2014

2013
Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads.
Proc. VLDB Endow., 2013

Predicting query execution time: Are optimizer cost models really unusable?
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

2012
Probase: a probabilistic taxonomy for text understanding.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Context-aware Search for Personal Information Management Systems.
Proceedings of the Twelfth SIAM International Conference on Data Mining, 2012

2011
iMecho: a context-aware desktop search system.
Proceedings of the Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011

2010
k-symmetry model for identity anonymization in social networks.
Proceedings of the EDBT 2010, 2010

2009
Search your memory ! - an associative memory based desktop search system.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009

Efficiently indexing shortest paths by exploiting symmetry in graphs.
Proceedings of the EDBT 2009, 2009

Personalization as a service: the architecture and a case study.
Proceedings of the First International CIKM Workshop on Cloud Data Management, 2009

iMecho: an associative memory based desktop search system.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009

2008
Structure-based graph distance measures of high degree of precision.
Pattern Recognit., 2008

Efficient Algorithms for Node Disjoint Subgraph Homeomorphism Determination.
Proceedings of the Database Systems for Advanced Applications, 2008


  Loading...