Evaluating Text-to-SQL Model Failures on Real-World Data.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024
Metadata Shaping: A Simple Approach for Knowledge-Enhanced Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
Metadata Shaping: Natural Language Annotations for the Tail.
CoRR, 2021
Observational Supervision for Medical Image Classification Using Gaze Data.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021
Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021
Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation.
Proceedings of the 11th Conference on Innovative Data Systems Research, 2021
Automating knowledge distillation and representation from richly formatted data.
PhD thesis, 2020
Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task Learning.
ACM Trans. Embed. Comput. Syst., 2020
Sharp Bias-variance Tradeoffs of Hard Parameter Sharing in High-dimensional Linear Regression.
CoRR, 2020
Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings.
CoRR, 2020
Understanding the Downstream Instability of Word Embeddings.
Proceedings of the Third Conference on Machine Learning and Systems, 2020
On the Generalization Effects of Linear Transformations in Data Augmentation.
Proceedings of the 37th International Conference on Machine Learning, 2020
Understanding and Improving Information Transfer in Multi-Task Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020
Ivy: Instrumental Variable Synthesis for Causal Inference.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020
Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices.
CoRR, 2019
Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Automating the generation of hardware component knowledge bases.
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019
Fonduer: Knowledge Base Construction from Richly Formatted Data.
Proceedings of the 2018 International Conference on Management of Data, 2018
Incremental knowledge base construction using DeepDive.
VLDB J., 2017
Snorkel: Rapid Training Data Creation with Weak Supervision.
Proc. VLDB Endow., 2017
SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data.
CoRR, 2017
DeepDive: declarative knowledge base construction.
Commun. ACM, 2017
Snorkel: A System for Lightweight Extraction.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017
DeepDive: Declarative Knowledge Base Construction.
SIGMOD Rec., 2016
Data Programming: Creating Large Training Sets, Quickly.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016
Incremental Knowledge Base Construction Using DeepDive.
Proc. VLDB Endow., 2015
Incremental Knowledge Base Construction Using DeepDive.
CoRR, 2015
Feature Engineering for Knowledge Base Construction.
IEEE Data Eng. Bull., 2014