Lei Cao

Orcid: 0000-0001-9909-8607

Affiliations:
  • University of Arizona, AZ, USA
  • Massachusetts Institute of Technology, Cambridge, MA, USA (former)


According to our database1, Lei Cao authored at least 62 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Pluto: Sample Selection for Robust Anomaly Detection on Polluted Log Data.
Proc. ACM Manag. Data, September, 2024

LakeCompass: An End-to-End System for Table Maintenance, Search and Analysis in Data Lakes.
Proc. VLDB Endow., August, 2024

Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL.
Proc. VLDB Endow., July, 2024

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes.
Proc. VLDB Endow., April, 2024

Outlier Summarization via Human Interpretable Rules.
Proc. VLDB Endow., March, 2024

MetaStore: Analyzing Deep Learning Meta-Data at Scale.
Proc. VLDB Endow., February, 2024

MisDetect: Iterative Mislabel Detection using Early Loss.
Proc. VLDB Endow., February, 2024

RITA: Group Attention is All You Need for Timeseries Analytics.
Proc. ACM Manag. Data, February, 2024

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models.
CoRR, 2024

CascadeServe: Unlocking Model Cascades for Inference Serving.
CoRR, 2024

A Declarative System for Optimizing AI Workloads.
CoRR, 2024

IDE: A System for Iterative Mislabel Detection.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

MoTTo: Scalable Motif Counting with Time-aware Topology Constraint for Large-scale Temporal Graphs.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

VerifAI: Verified Generative AI.
Proceedings of the 14th Conference on Innovative Data Systems Research, 2024

2023
Extract-Transform-Load for Video Streams.
Proc. VLDB Endow., 2023

Lingua Manga: A Generic Large Language Model Centric System for Data Curation.
Proc. VLDB Endow., 2023

Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning.
Proc. ACM Manag. Data, 2023

AutoOD: Automatic Outlier Detection.
Proc. ACM Manag. Data, 2023

SEED: Simple, Efficient, and Effective Data Management via Large Language Models.
CoRR, 2023

VerifAI: Verified Generative AI.
CoRR, 2023

RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training.
CoRR, 2023

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation.
CoRR, 2023

Interpretable Outlier Summarization.
CoRR, 2023

Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes.
Proceedings of the 13th Conference on Innovative Data Systems Research, 2023

2022
A Demonstration of AutoOD: A Self-tuning Anomaly Detection System.
Proc. VLDB Endow., 2022

Online Discovery of Evolving Groups over Massive-Scale Trajectory Streams.
CoRR, 2022

Scalable Motif Counting for Large-scale Temporal Graphs.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

2021
LANCET: Labeling Complex Data at Scale.
Proc. VLDB Endow., 2021

Epoch-based Commit and Replication in Distributed OLTP Databases.
Proc. VLDB Endow., 2021

ATLANTIC: Making Database Differentially Private and Faster with Accuracy Guarantee.
Proc. VLDB Endow., 2021

Machine Learning for Databases.
Proc. VLDB Endow., 2021

AI Meets Database: AI4DB and DB4AI.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

ELITE: Robust Deep Anomaly Detection with Meta Gradient.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

2020
Aria: A Fast and Practical Deterministic OLTP Database.
Proc. VLDB Endow., 2020

Find you if you drive: Inferring home locations for vehicles with surveillance camera data.
Knowl. Based Syst., 2020

Continuously Adaptive Similarity Search.
Proceedings of the 2020 International Conference on Management of Data, 2020

Human-in-the-loop Outlier Detection.
Proceedings of the 2020 International Conference on Management of Data, 2020

Dagger: A Data (not code) Debugger.
Proceedings of the 10th Conference on Innovative Data Systems Research, 2020

2019
Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics.
Proc. VLDB Endow., 2019

Efficient Discovery of Sequence Outlier Patterns.
Proc. VLDB Endow., 2019

Smile: A System to Support Machine Learning on EEG Data at Scale.
Proc. VLDB Endow., 2019

Scalable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams.
Proceedings of the Advances in Database Technology, 2019

2018
SWIFT: Mining Representative Patterns from Large Event Streams.
Proc. VLDB Endow., 2018

2017
Outlier Detection over Massive-Scale Trajectory Streams.
ACM Trans. Database Syst., 2017

How to foster innovation: A data-driven approach to measuring economic competitiveness.
IBM J. Res. Dev., 2017

Pivot-Based Distributed K-Nearest Neighbor Mining.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2017

Discovering Evolving Moving Object Groups from Massive-Scale Trajectory Streams.
Proceedings of the 18th IEEE International Conference on Mobile Data Management, 2017

Scalable Top-n Local Outlier Detection.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13, 2017

Distributed Local Outlier Detection in Big Data.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13, 2017

MARAS: Signaling Multi-Drug Adverse Reactions.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13, 2017

Multi-Tactic Distance-Based Outlier Detection.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Interactive Analytics System for Exploring Outliers.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Distributed Top-N local outlier detection in big data.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
Sharing-Aware Outlier Analytics over High-Volume Data Streams.
Proceedings of the 2016 International Conference on Management of Data, 2016

Multi-query outlier detection over data streams: poster.
Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, 2016

2015
Exploiting Sharing Opportunities for Real-time Complex Event Analytics.
IEEE Data Eng. Bull., 2015

Online Outlier Exploration Over Large Datasets.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015

2014
Interactive Outlier Exploration in Big Data Streams.
Proc. VLDB Endow., 2014

Complex event analytics: online aggregation of stream sequence patterns.
Proceedings of the International Conference on Management of Data, 2014

Detecting moving object outliers in massive-scale trajectory streams.
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014

Scalable distance-based outlier detection over high-volume data streams.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

2013
High Performance Stream Query Processing With Correlation-Aware Partitioning.
Proc. VLDB Endow., 2013


  Loading...