Arun Kumar

Affiliations:
  • University of California, San Diego, USA
  • University of Wisconsin-Madison, WI, USA (PhD 2016)


According to our database1, Arun Kumar authored at least 68 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses.
Proc. VLDB Endow., February, 2024

Generating Cross-model Analytics Workloads Using LLMs.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

2023
Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads.
Proc. VLDB Endow., December, 2023

Lotan: Bridging the Gap between GNNs and Scalable Graph Analytics Engines.
Proc. VLDB Endow., 2023

Saturn: Efficient Multi-Large-Model Deep Learning.
CoRR, 2023

Saturn: An Optimized Data System for Large Model Deep Learning Workloads.
CoRR, 2023

An Optimized Tri-store System for Multi-model Data Analytics.
CoRR, 2023

Database-Aware ASR Error Correction for Speech-to-SQL Parsing.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
VLDB Scalable Data Science Category: The Inaugural Year.
SIGMOD Rec., 2022

Database Education at UC San Diego.
SIGMOD Rec., 2022

Structured Data Representation in Natural Language Interfaces.
IEEE Data Eng. Bull., 2022

Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

2021
VLDB Panel Summary: "The Future of Data(base) Education: Is the Cow Book Dead?".
SIGMOD Rec., 2021

Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches.
Proc. VLDB Endow., 2021

Errata for "Cerebro: A Data System for Optimized Deep Learning Model Selection".
Proc. VLDB Endow., 2021

Intermittent Human-in-the-Loop Model Selection using Cerebro: A Demonstration.
Proc. VLDB Endow., 2021

Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning.
Proc. VLDB Endow., 2021

Front Matter.
Proc. VLDB Endow., 2021

Towards A Polyglot Framework for Factorized ML.
Proc. VLDB Endow., 2021

Letter from the Rising Star Award Winner.
IEEE Data Eng. Bull., 2021

Processing Analytical Queries in the AWESOME Polystore [Information Systems Architectures].
CoRR, 2021

Hydra: A System for Large Multi-Model Deep Learning.
CoRR, 2021

Towards Benchmarking Feature Type Inference for AutoML Platforms.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Cerebro: A Layered Data Platform for Scalable Deep Learning.
Proceedings of the 11th Conference on Innovative Data Systems Research, 2021

2020
Incremental and Approximate Computations for Accelerating Deep CNN Inference.
ACM Trans. Database Syst., 2020

Query Optimization for Faster Deep CNN Explanations.
SIGMOD Rec., 2020

Understanding and Benchmarking the Impact of GDPR on Database Systems.
Proc. VLDB Endow., 2020

Cerebro: A Data System for Optimized Deep Learning Model Selection.
Proc. VLDB Endow., 2020

SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data.
Proceedings of the 2020 International Conference on Management of Data, 2020

Vista: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale.
Proceedings of the 2020 International Conference on Management of Data, 2020

2019
Data Management in Machine Learning Systems
Synthesis Lectures on Data Management, Morgan & Claypool Publishers, ISBN: 978-3-031-01869-5, 2019

Guest Editors' Introduction to the Special Section on the 33rd International Conference on Data Engineering (ICDE 2017).
IEEE Trans. Knowl. Data Eng., 2019

Panorama: A Data System for Unbounded Vocabulary Querying over Video.
Proc. VLDB Endow., 2019

Demonstration of Krypton: Optimized CNN Inference for Occlusion-based Deep CNN Explanations.
Proc. VLDB Endow., 2019

Predicting Eating Events in Free Living Individuals - A Technical Report.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data.
Proceedings of the 2019 International Conference on Management of Data, 2019

The ML Data Prep Zoo: Towards Semi-Automatic Data Preparation for ML.
Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, 2019

Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems.
Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, 2019

Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations.
Proceedings of the 2019 International Conference on Management of Data, 2019

Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent.
Proceedings of the 2019 International Conference on Management of Data, 2019

Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear Algebra.
Proceedings of the 2019 International Conference on Management of Data, 2019

Demonstration of Nimbus: Model-based Pricing for Machine Learning in a Data Marketplace.
Proceedings of the 2019 International Conference on Management of Data, 2019

Towards Model-based Pricing for Machine Learning in a Data Marketplace.
Proceedings of the 2019 International Conference on Management of Data, 2019

Predicting Eating Events in Free Living Individuals.
Proceedings of the 15th International Conference on eScience, 2019

2018
In-RDBMS Hardware Acceleration of Advanced Analytics.
Proc. VLDB Endow., 2018

Model-based Pricing for Machine Learning in a Data Marketplace.
CoRR, 2018

2017
Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?
Proc. VLDB Endow., 2017

Towards Linear Algebra over Normalized Data.
Proc. VLDB Endow., 2017

Stop That Join! Discarding Dimension Tables when Learning High Capacity Classifiers.
CoRR, 2017

When Lempel-Ziv-Welch Meets Machine Learning: A Case Study of Accelerating Machine Learning using Coding.
CoRR, 2017

Data Management in Machine Learning: Challenges, Techniques, and Systems.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Model-based Pricing: Do Not Pay for More than What You Learn!
Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning, 2017

SpeakQL: Towards Speech-driven Multi-modal Querying.
Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017

Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

2016
Materialization Optimizations for Feature Selection Workloads.
ACM Trans. Database Syst., 2016

Differentially Private Stochastic Gradient Descent for in-RDBMS Analytics.
CoRR, 2016

To Join or Not to Join?: Thinking Twice about Joins before Feature Selection.
Proceedings of the 2016 International Conference on Management of Data, 2016

2015
Model Selection Management Systems: The Next Frontier of Advanced Analytics.
SIGMOD Rec., 2015

Demonstration of Santoku: Optimizing Machine Learning over Normalized Data.
Proc. VLDB Endow., 2015

Learning Generalized Linear Models Over Normalized Data.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

2013
Hazy: Making it Easier to Build and Maintain Big-data Analytics.
ACM Queue, 2013

Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System.
Proc. VLDB Endow., 2013

Brainwash: A Data System for Feature Engineering.
Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research, 2013

2012
The MADlib Analytics Library or MAD Skills, the SQL.
Proc. VLDB Endow., 2012

Towards a unified architecture for in-RDBMS analytics.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

2011
Probabilistic Management of OCR Data using an RDBMS.
Proc. VLDB Endow., 2011


  Loading...