Sebastian Schelter

Orcid: 0000-0003-4722-5840

Affiliations:
  • University of Amsterdam, The Netherlands


According to our database1, Sebastian Schelter authored at least 95 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Snapcase - Regain Control over Your Predictions with Low-Latency Machine Unlearning.
Proc. VLDB Endow., August, 2024

A Flexible Forecasting Stack.
Proc. VLDB Endow., August, 2024

Assisted design of data science pipelines.
VLDB J., July, 2024

Domain Generalization in Time Series Forecasting.
ACM Trans. Knowl. Discov. Data, June, 2024

SchemaPile: A Large Collection of Relational Database Schemas.
Proc. ACM Manag. Data, 2024

Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI.
IEEE Data Eng. Bull., 2024

Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code!
CoRR, 2024

AnyMatch - Efficient Zero-Shot Entity Matching with a Small Language Model.
CoRR, 2024

Data Debugging with Shapley Importance over Machine Learning Pipelines.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Directions Towards Efficient and Automated Data Wrangling with Large Language Models.
Proceedings of the 40th International Conference on Data Engineering, ICDE 2024, 2024

Etude - Evaluating the Inference Latency of Session-Based Recommendation Models at Scale.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines".
Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning, 2024

2023
shubhaguha/mlwhatif-demo: Demo for VLDB 2023.
Dataset, July, 2023

MLWHATIF: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses Over and Over?
Proc. VLDB Endow., 2023

Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines.
Proc. ACM Manag. Data, 2023

Hierarchical Forecasting at Scale.
CoRR, 2023

Improving Retrieval-Augmented Large Language Models via Data Importance Learning.
CoRR, 2023

Provenance Tracking for End-to-End Machine Learning Pipelines.
Proceedings of the Companion Proceedings of the ACM Web Conference 2023, 2023

A Personalized Neighborhood-based Model for Within-basket Recommendation in Grocery Shopping.
Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023

Proactively Screening Machine Learning Pipelines with ARGUSEYES.
Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

Forget Me Now: Fast and Exact Unlearning in Neighborhood-based Recommendation.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

On the Impact of Outlier Bias on User Clicks.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Automated Data Cleaning Can Hurt Fairness in Machine Learning-based Decision Making.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

Reconstructing and Querying ML Pipeline Intermediates.
Proceedings of the 13th Conference on Innovative Data Systems Research, 2023

How to Make an Outlier? Studying the Effect of Presentational Features on the Outlierness of Items in Product Search Results.
Proceedings of the 2023 Conference on Human Information Interaction and Retrieval, 2023

2022

Data distribution debugging in machine learning pipelines.
VLDB J., 2022

DORIAN in action: Assisted Design of Data Science Pipelines.
Proc. VLDB Endow., 2022

Letter from the Special Issue Editor.
IEEE Data Eng. Bull., 2022

Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines.
CoRR, 2022

Efficiently Maintaining Next Basket Recommendations under Additions and Deletions of Baskets and Items.
CoRR, 2022

Responsible data management.
Commun. ACM, 2022

Understanding Financial Information Seeking Behavior from User Interactions with Company Filings.
Proceedings of the Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25, 2022

Understanding and Mitigating the Effect of Outliers in Fair Ranking.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022

Serenade - Low-Latency Session-Based Recommendation in e-Commerce at Scale.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

ReCANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

GitSchemas: A Dataset for Automating Relational Data Preparation Tasks.
Proceedings of the 38th IEEE International Conference on Data Engineering Workshops, 2022

Towards data-centric what-if analysis for native machine learning pipelines.
Proceedings of the DEEM '22: Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning Philadelphia, 2022

Screening Native Machine Learning Pipelines with ArgusEyes.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

2021
Parameter Efficient Deep Probabilistic Forecasting.
CoRR, 2021

HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Learnings from a Retail Recommendation System on Billions of Interactions at bol.com.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

JENGA - A Framework to Study the Impact of Data Errors on the Predictions of Machine Learning Models.
Proceedings of the 24th International Conference on Extending Database Technology, 2021

Automating Data Quality Validation for Dynamic Data Ingestion.
Proceedings of the 24th International Conference on Extending Database Technology, 2021

Understanding Multi-channel Customer Behavior in Retail.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines.
Proceedings of the 11th Conference on Innovative Data Systems Research, 2021

2020
Technical Perspective: Query Optimization for Faster Deep CNN Explanations.
SIGMOD Rec., 2020

Apache Mahout: Machine Learning on Distributed Dataflow Systems.
J. Mach. Learn. Res., 2020

Taming Technical Bias in Machine Learning Pipelines.
IEEE Data Eng. Bull., 2020

Analyzing and Predicting Purchase Intent in E-commerce: Anonymous vs. Identified Customers.
CoRR, 2020

A Comparison of Supervised Learning to Match Methods for Product Search.
CoRR, 2020

HDDse: Enabling High-Dimensional Disk State Embedding for Generic Failure Detection System of Heterogeneous Disks in Large Data Centers.
Proceedings of the 2020 USENIX Annual Technical Conference, 2020

Learning to Validate the Predictions of Black Box Classifiers on Unseen Data.
Proceedings of the 2020 International Conference on Management of Data, 2020


Three Challenges in Building Industrial-Scale Recommender Systems.
Proceedings of the 3rd Workshop on Online Recommender Systems and User Modeling co-located with the 14th ACM Conference on Recommender Systems (RecSys 2020), 2020

Demand Forecasting in the Presence of Privileged Information.
Proceedings of the Advanced Analytics and Learning on Temporal Data, 2020

FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions.
Proceedings of the 23rd International Conference on Extending Database Technology, 2020

Towards Unsupervised Data Quality Validation on Dynamic Data.
Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, 2020

Zooming Out on an Evolving Graph.
Proceedings of the 23rd International Conference on Extending Database Technology, 2020

Tier-Scrubbing: An Adaptive and Tiered Disk Scrubbing Scheme with Improved MTTD and Reduced Cost.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

"Amnesia" - Machine Learning Models That Can Forget User Data Very Fast.
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

2019
An Intermediate Representation for Optimizing Machine Learning Pipelines.
Proc. VLDB Endow., 2019

DataWig: Missing Value Imputation for Tables.
J. Mach. Learn. Res., 2019

ADABench - Towards an Industry Standard Benchmark for Advanced Analytics.
Proceedings of the Performance Evaluation and Benchmarking for the Era of Cloud(s), 2019

Efficient Incremental Cooccurrence Analysis for Item-Based Collaborative Filtering.
Proceedings of the 31st International Conference on Scientific and Statistical Database Management, 2019

DEEM 2019: Workshop on Data Management for End-to-End Machine Learning.
Proceedings of the 2019 International Conference on Management of Data, 2019

Unit Testing Data with Deequ.
Proceedings of the 2019 International Conference on Management of Data, 2019

Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2019

Differential Data Quality Verification on Partitioned Data.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

2018
Automating Large-Scale Data Quality Verification.
Proc. VLDB Endow., 2018

On the Ubiquity of Web Tracking: Insights from a Billion-Page Web Crawl.
J. Web Sci., 2018

On Challenges in Machine Learning Model Management.
IEEE Data Eng. Bull., 2018

Benchmarking Distributed Data Processing Systems for Machine Learning Workloads.
Proceedings of the Performance Evaluation and Benchmarking for the Era of Artificial Intelligence, 2018

"Deep" Learning for Missing Value Imputationin Tables with Non-Numerical Data.
Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018

2017
BlockJoin: Efficient Matrix Partitioning Through Joins.
Proc. VLDB Endow., 2017

Probabilistic Demand Forecasting at Scale.
Proc. VLDB Endow., 2017

'Dark Germany': Temporal Characteristics and Connectivity Patterns in Online Far-Right Protests Against Refugee Housing.
Proceedings of the 2017 ACM on Web Science Conference, 2017

'Dark Germany': Hidden Patterns of Participation in Online Far-Right Protests Against Refugee Housing.
Proceedings of the Social Informatics, 2017

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems.
Proceedings of the Datenbanksysteme für Business, 2017

2016
Scaling data mining in massively parallel dataflow systems.
PhD thesis, 2016

Doubly stochastic large scale kernel learning with the empirical kernel map.
CoRR, 2016

Tracking the Trackers: A Large-Scale Analysis of Embedded Web Trackers.
Proceedings of the Tenth International Conference on Web and Social Media, 2016

Structural Patterns in the Rise of Germany's New Right on Facebook.
Proceedings of the IEEE International Conference on Data Mining Workshops, 2016

Apache Flink: Stream Analytics at Scale.
Proceedings of the 2016 IEEE International Conference on Cloud Engineering Workshop, 2016

2015
Optimistic Recovery for Iterative Dataflows in Action.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

Efficient sample generation for scalable meta learning.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

2014
The Stratosphere platform for big data analytics.
VLDB J., 2014

Factorbird - a Parameter Server Approach to Distributed Matrix Factorization.
CoRR, 2014

Scaling data mining in massively parallel dataflow systems.
Proceedings of the International Conference on Management of Data, 2014

2013
Iterative parallel data processing with stratosphere: an inside look.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Distributed matrix factorization with mapreduce using a series of broadcast-joins.
Proceedings of the Seventh ACM Conference on Recommender Systems, 2013

"All roads lead to Rome": optimistic recovery for distributed iterative data processing.
Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, 2013

2012
Scalable similarity-based neighborhood methods with MapReduce.
Proceedings of the Sixth ACM Conference on Recommender Systems, 2012


  Loading...