Tim Kraska

Orcid: 0009-0003-2414-2759

Affiliations:
  • MIT Cambridge, MA, USA
  • Brown University, Providence, RI, USA
  • ETH Zurich, Switzerland


According to our database1, Tim Kraska authored at least 201 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Databases Unbound: Querying All of the World's Bytes with AI.
Proc. VLDB Endow., August, 2024

Resource Management in Aurora Serverless.
Proc. VLDB Endow., August, 2024

Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD.
Proc. VLDB Endow., July, 2024

Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet.
Proc. VLDB Endow., July, 2024

Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD - Extended Version.
CoRR, 2024

A Declarative System for Optimizing AI Workloads.
CoRR, 2024

PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design.
CoRR, 2024

Stage: Query Execution Time Prediction in Amazon Redshift.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

Intelligent Scaling in Amazon Redshift.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

Automated Multidimensional Data Layouts in Amazon Redshift.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

Panda: Performance Debugging for Databases using LLM Agents.
Proceedings of the 14th Conference on Innovative Data Systems Research, 2024

Mallet: SQL Dialect Translation with LLM Rule Generation.
Proceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, 2024

2023
Technical Perspective for Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory.
SIGMOD Rec., 2023

The Case for Learned In-Memory Joins.
Proc. VLDB Endow., 2023

Robust Query Driven Cardinality Estimation under Changing Workloads.
Proc. VLDB Endow., 2023

Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes.
Proc. VLDB Endow., 2023

Extract-Transform-Load for Video Streams.
Proc. VLDB Endow., 2023

FactorJoin: A New Cardinality Estimation Framework for Join Queries.
Proc. ACM Manag. Data, 2023

SEED: Simple, Efficient, and Effective Data Management via Large Language Models.
CoRR, 2023

Parallel External Sorting of ASCII Records Using Learned Models.
CoRR, 2023

Hyperspecialized Compilation for Serverless Data Analytics.
Proceedings of the Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28, 2023

CorBit: Leveraging Correlations for Compressing Bitmap Indexes.
Proceedings of the Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28, 2023

Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift.
Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

Unshackling Database Benchmarking from Synthetic Workloads.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

2022
Bao: Making Learned Query Optimization Practical.
SIGMOD Rec., 2022

TreeLine: An Update-In-Place Key-Value Store for Modern Storage.
Proc. VLDB Endow., 2022

SNARF: A Learning-Enhanced Range Filter.
Proc. VLDB Endow., 2022

Can Learned Models Replace Hash Functions?
Proc. VLDB Endow., 2022

SageDB: An Instance-Optimized Data Analytics System.
Proc. VLDB Endow., 2022

LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

LSI: a learned secondary index structure.
Proceedings of the aiDM '22: Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, 2022

ExSample: Efficient Searches on Video Repositories through Adaptive Sampling.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Self-Organizing Data Containers.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

2021
Chiller: Contention-centric Transaction Execution and Data Partitioning for Modern Networks.
SIGMOD Rec., 2021

DBOS: A DBMS-oriented Operating System.
Proc. VLDB Endow., 2021

Davos: A System for Interactive Data-Driven Decision Making.
Proc. VLDB Endow., 2021

Flow-Loss: Learning Cardinality Estimates That Matter.
Proc. VLDB Endow., 2021

Towards instance-optimized data systems.
Proc. VLDB Endow., 2021

ML-In-Databases: Assessment and Prognosis.
IEEE Data Eng. Bull., 2021

Bounding the Last Mile: Efficient Learned String Indexing.
CoRR, 2021

PLEX: Towards Practical Learned Indexing.
CoRR, 2021

Defeating duplicates: A re-design of the LearnedSort algorithm.
CoRR, 2021

When Are Learned Models Better Than Hash Functions?
CoRR, 2021

SkyQuery: An Aerial Drone Video Sensing Platform.
CoRR, 2021

TagMe: GPS-Assisted Automatic Object Annotation in Videos.
CoRR, 2021

Living in a Candy Store - from being a PhD Student to Working as a Faculty Member on ML for Systems.
Proceedings of the VLDB 2021 PhD Workshop co-located with the 47th International Conference on Very Large Databases (VLDB 2021), 2021

Tuplex: Data Science in Python at Native Code Speed.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Steering Query Optimizers: A Practical Take on Big Data Workloads.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Instance-Optimized Data Layouts for Cloud Analytics Workloads.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

LEA: A Learned Encoding Advisor for Column Stores.
Proceedings of the aiDM '21: Fourth Workshop in Exploiting AI Techniques for Data Management, 2021

Partitioned Learned Bloom Filters.
Proceedings of the 9th International Conference on Learning Representations, 2021

Towards a Benchmark for Learned Systems.
Proceedings of the 37th IEEE International Conference on Data Engineering Workshops, 2021

2020
Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach.
IEEE Trans. Knowl. Data Eng., 2020

Poly'19 Workshop Summary: GDPR.
SIGMOD Rec., 2020

Benchmarking Learned Indexes.
Proc. VLDB Endow., 2020

Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads.
Proc. VLDB Endow., 2020

ARDA: Automatic Relational Data Augmentation for Machine Learning.
Proc. VLDB Endow., 2020

Learned Indexes for a Google-scale Disk-based Database.
CoRR, 2020

Cortex: Harnessing Correlations to Boost Query Performance.
CoRR, 2020

DBOS: A Proposal for a Data-Centric Operating System.
CoRR, 2020

MISIM: An End-to-End Neural Code Similarity System.
CoRR, 2020

Partitioned Learned Bloom Filter.
CoRR, 2020

Fast Mapping onto Census Blocks.
CoRR, 2020

Bao: Learning to Steer Query Optimizers.
CoRR, 2020

Context-Aware Parse Trees.
CoRR, 2020

Learning Multi-Dimensional Indexes.
Proceedings of the 2020 International Conference on Management of Data, 2020

CDFShop: Exploring and Optimizing Learned Index Structures.
Proceedings of the 2020 International Conference on Management of Data, 2020

The Case for a Learned Sorting Algorithm.
Proceedings of the 2020 International Conference on Management of Data, 2020

RadixSpline: a single-pass learned index.
Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, 2020

DB4ML - An In-Memory Database Kernel with Machine Learning Support.
Proceedings of the 2020 International Conference on Management of Data, 2020

IDEBench: A Benchmark for Interactive Data Exploration.
Proceedings of the 2020 International Conference on Management of Data, 2020

ALEX: An Updatable Adaptive Learned Index.
Proceedings of the 2020 International Conference on Management of Data, 2020

MIRIS: Fast Object Track Queries in Video.
Proceedings of the 2020 International Conference on Management of Data, 2020

Learned garbage collection.
Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2020

BeeCluster: drone orchestration via predictive optimization.
Proceedings of the MobiSys '20: The 18th Annual International Conference on Mobile Systems, 2020

Cost-Guided Cardinality Estimation: Focus Where it Matters.
Proceedings of the 36th IEEE International Conference on Data Engineering Workshops, 2020

Getting Swole: Generating Access-Aware Code with Predicate Pullups.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020



A Polystore Based Database Operating System (DBOS).
Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2020

2019
The SIGMOD 2019 Research Track Reviewing System.
SIGMOD Rec., 2019

The Seattle Report on Database Research.
SIGMOD Rec., 2019

Rethinking Database High Availability with RDMA Networks.
Proc. VLDB Endow., 2019

Choosing A Cloud DBMS: Architectures and Tradeoffs.
Proc. VLDB Endow., 2019

Tuplex: Robust, Efficient Analytics When Python Rules.
Proc. VLDB Endow., 2019

Neo: A Learned Query Optimizer.
Proc. VLDB Endow., 2019

SOSD: A Benchmark for Learned Indexes.
CoRR, 2019

LISA: Towards Learned DNA Sequence Search.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

Custodes: Auditable Hypothesis Testing.
CoRR, 2019

SchengenDB: A Data Protection Database Proposal.
Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2019

Democratizing Data Science through Interactive Curation of ML Pipelines.
Proceedings of the 2019 International Conference on Management of Data, 2019

From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems.
Proceedings of the 2019 International Conference on Management of Data, 2019

FITing-Tree: A Data-aware Index Structure.
Proceedings of the 2019 International Conference on Management of Data, 2019

Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks.
Proceedings of the 2019 International Conference on Management of Data, 2019

Park: An Open Platform for Learning-Augmented Computer Systems.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Sherlock: A Deep Learning Approach to Semantic Data Type Detection.
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019

How I Learned to Stop Worrying and Love Re-optimization.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Slice Finder: Automated Data Slicing for Model Validation.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

VizCertify: A Framework for Secure Visual Data Exploration.
Proceedings of the 2019 IEEE International Conference on Data Science and Advanced Analytics, 2019

SageDB: A Learned Database System.
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository.
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019

VizML: A Machine Learning Approach to Visualization Recommendation.
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019

2018
Distributed Machine Learning.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Estimating the Impact of Unknown Unknowns on Aggregate Query Results.
ACM Trans. Database Syst., 2018

Northstar: An Interactive Data Science System.
Proc. VLDB Endow., 2018

Towards Quantifying Uncertainty in Data Analysis & Exploration.
IEEE Data Eng. Bull., 2018

Chiller: Contention-centric Transaction Execution and Data Partitioning for Fast Networks.
CoRR, 2018

VizRec: A framework for secure data exploration via visual representation.
CoRR, 2018

Unknown Examples & Machine Learning Model Generalization.
CoRR, 2018

Slice Finder: Automated Data Slicing for Model Validation.
CoRR, 2018

Smallify: Learning Network Size while Training.
CoRR, 2018

A-Tree: A Bounded Approximate Index Structure.
CoRR, 2018

FastDAWG: Improving Data Migration in the BigDAWG Polystore System.
Proceedings of the Heterogeneous Data Management, Polystores, and Analytics for Healthcare, 2018

The Case for Learned Index Structures.
Proceedings of the 2018 International Conference on Management of Data, 2018

Towards Interactive Curation & Automatic Tuning of ML Pipelines.
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018

Superneurons: dynamic GPU memory management for training deep neural networks.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Investigating the Effect of the Multiple Comparisons Problem in Visual Analysis.
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018

2017
How Progressive Visualizations Affect Exploratory Analysis.
IEEE Trans. Vis. Comput. Graph., 2017

The End of a Myth: Distributed Transaction Can Scale.
Proc. VLDB Endow., 2017

Revisiting Reuse for Approximate Query Processing.
Proc. VLDB Endow., 2017

A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets.
Proc. VLDB Endow., 2017

Rethinking Distributed Query Execution on High-Speed Networks.
IEEE Data Eng. Bull., 2017

Letter from the Special Issue Editor.
IEEE Data Eng. Bull., 2017

Safe Visual Data Exploration.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Controlling False Discoveries During Interactive Data Exploration.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Approximate Query Processing for Interactive Data Science.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration.
Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017

Revisiting Reuse in Main Memory Database Systems.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Data Science Education: We're Missing the Boat, Again.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Toward Sustainable Insights, or Why Polygamy is Bad for You.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

IncMap: A Journey towards Ontology-based Data Integration.
Proceedings of the Datenbanksysteme für Business, 2017

Spotlytics: How to Use Cloud Market Places for Analytics?
Proceedings of the Datenbanksysteme für Business, 2017


2016
The End of Slow Networks: It's Time for a Redesign.
Proc. VLDB Endow., 2016

Towards a Benchmark for Interactive Data Exploration.
IEEE Data Eng. Bull., 2016

The End of a Myth: Distributed Transactions Can Scale.
CoRR, 2016

Answering enumeration queries with the crowd.
Commun. ACM, 2016

Making the Case for Query-by-Voice with EchoQuery.
Proceedings of the 2016 International Conference on Management of Data, 2016

PrivateClean: Data Cleaning and Differential Privacy.
Proceedings of the 2016 International Conference on Management of Data, 2016

VisTrees: fast indexes for interactive data exploration.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016

The case for interactive data exploration accelerators (IDEAs).
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016

Dark Data: Are we solving the right problems?
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

2015
Crowdsourcing Enumeration Queries: Estimators and Interfaces.
IEEE Trans. Knowl. Data Eng., 2015

S-Store: Streaming Meets Transaction Processing.
Proc. VLDB Endow., 2015

Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views.
Proc. VLDB Endow., 2015

A Demonstration of the BigDAWG Polystore System.
Proc. VLDB Endow., 2015

Vizdom: Interactive Analytics through Pen and Touch.
Proc. VLDB Endow., 2015

An Architecture for Compiling UDF-centric Workflows.
Proc. VLDB Endow., 2015

SampleClean: Fast and Reliable Analytics on Dirty Data.
IEEE Data Eng. Bull., 2015

TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries.
CoRR, 2015

Fault-Tolerant Entity Resolution with the Crowd.
CoRR, 2015

The End of Slow Networks: It's Time for a Redesign.
CoRR, 2015

Cost-based Fault-tolerance for Parallel Data Processing.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype?
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2.
Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, 2015

Automating model search for large scale machine learning.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

Tupleware: "Big" Data, Big Analytics, Small Clusters.
Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research, 2015

2014
S-Store: A Streaming NewSQL System for Big Velocity Applications.
Proc. VLDB Endow., 2014

Putting Analytics on the Spot: Or How to Lower the Cost for Analytics.
IEEE Internet Comput., 2014

Tupleware: Distributed Machine Learning on Small Clusters.
IEEE Data Eng. Bull., 2014

The Expected Optimal Labeling Order Problem for Crowdsourced Joins and Entity Resolution.
CoRR, 2014

Tupleware: Redefining Modern Analytics.
CoRR, 2014

A sample-and-clean framework for fast and accurate query processing on dirty data.
Proceedings of the International Conference on Management of Data, 2014

PLANET: making progress with commit processing in unpredictable environments.
Proceedings of the International Conference on Management of Data, 2014

Should we all be teaching "intro to data science" instead of "intro to databases"?
Proceedings of the International Conference on Management of Data, 2014

2013
The New Database Architectures.
IEEE Internet Comput., 2013

Finding the Needle in the Big Data Systems Haystack.
IEEE Internet Comput., 2013

Leveraging transitive relations for crowdsourced joins.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

RTP: robust tenant placement for elastic in-memory database clusters.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Generalized scale independence through incremental precomputation.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

MLI: An API for Distributed Machine Learning.
Proceedings of the 2013 IEEE 13th International Conference on Data Mining, 2013

Crowdsourced enumeration queries.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

A Framework for Adaptive Crowd Query Processing.
Proceedings of the Human Computation and Crowdsourcing: Works in Progress and Demonstration Abstracts, 2013

CASTLE: Crowd-Assisted System for Text Labeling and Extraction.
Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, 2013

MDCC: multi-data center consistency.
Proceedings of the Eighth Eurosys Conference 2013, 2013

MLbase: A Distributed Machine-learning System.
Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research, 2013

CrowdQ: Crowdsourced Query Understanding.
Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research, 2013

2012
CrowdER: Crowdsourcing Entity Resolution.
Proc. VLDB Endow., 2012

MDCC: Multi-Data Center Consistency
CoRR, 2012

Getting It All from the Crowd
CoRR, 2012

Stormy: an elastic and highly available streaming service in the cloud.
Proceedings of the 2012 Joint EDBT/ICDT Workshops, Berlin, Germany, March 30, 2012, 2012

2011
Repeatability and workability evaluation of SIGMOD 2011.
SIGMOD Rec., 2011

CrowdDB: Query Processing with the VLDB Crowd.
Proc. VLDB Endow., 2011

Crowdsourcing Applications and Platforms: A Data Management Perspective.
Proc. VLDB Endow., 2011

PIQL: Success-Tolerant Query Processing in the Cloud.
Proc. VLDB Endow., 2011

CrowdDB: answering queries with crowdsourcing.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2011

2010
Building Database Applications in the Cloud.
PhD thesis, 2010

Cloudy: A Modular Cloud Storage System.
Proc. VLDB Endow., 2010

Data Management in the Cloud: Promises, State-of-the-art, and Open Questions.
Datenbank-Spektrum, 2010

An evaluation of alternative architectures for transaction processing in the cloud.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

2009
Consistency Rationing in the Cloud: Pay only when it matters.
Proc. VLDB Endow., 2009

XQuery Reloaded.
Proc. VLDB Endow., 2009

XQuery in the browser.
Proceedings of the 18th International Conference on World Wide Web, 2009

How is the weather tomorrow?: towards a benchmark for the cloud.
Proceedings of the 2nd International Workshop on Testing Database Systems, 2009

2008
XQuery in the browser.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

Building a database on S3.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

2007
Extending XQuery with Window Functions.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

2006
Genea: Schema-Aware Mapping of Ontologies into Relational Databases.
Proceedings of the 13th International Conference on Management of Data, 2006

PathBank: Web-Based Querying and Visualziation of an Integrated Biological Pathway Database.
Proceedings of the Third International Conference on Computer Graphics, 2006


  Loading...