Susan B. Davidson

Orcid: 0009-0003-9259-9662

  • University of Pennsylvania, Philadelphia, PA, USA

According to our database1, Susan B. Davidson authored at least 151 papers between 1984 and 2024.

Collaborative distances:


ACM Fellow

ACM Fellow 2001, "For seminal contributions to distributed databases, real-time systems, heterogeneous database integration, warehousing, semi-structured data and for application of database research in bioinformatics.".



In proceedings 
PhD thesis 


Online presence:



Learning Approximation Sets for Exploratory Queries.
CoRR, 2024

ASQP-RL Demo: Learning Approximation Sets for Exploratory Queries.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

Selecting Sub-tables for Data Exploration.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

Efficiently Archiving Photos under Storage Constraints.
Proceedings of the Proceedings 26th International Conference on Extending Database Technology, 2023

PHOcus: Efficiently Archiving Photos.
Proc. VLDB Endow., 2022

Credit distribution in relational scientific databases.
Inf. Syst., 2022

Provenance-based Model Maintenance: Implications for Privacy.
IEEE Data Eng. Bull., 2022

Disposal by Design.
IEEE Data Eng. Bull., 2022

SubTab: Data Exploration with Informative Sub-Tables.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

ShapGraph: An Holistic View of Explanations through Provenance Graphs and Shapley Values.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties.
Proc. VLDB Endow., 2021

It's not just Cookies and Tea.
Proc. VLDB Endow., 2021

Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients.
CoRR, 2021

CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties (Technical Report).
CoRR, 2021

Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Susan Davidson Speaks Out on Collaborating with Other Research Areas and Balancing Work and Family.
SIGMOD Rec., 2020

Data Provenance for Attributes: Attribute Lineage.
Proceedings of the 12th International Workshop on Theory and Practice of Provenance, 2020

PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models.
Proceedings of the 2020 International Conference on Management of Data, 2020

DeltaGrad: Rapid retraining of machine learning models.
Proceedings of the 37th International Conference on Machine Learning, 2020

Automating Software Citation using GitCite.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Provenance for Probabilistic Logic Programs.
Proceedings of the 23rd International Conference on Extending Database Technology, 2020

ProvCite: Provenance-based Data Citation.
Proc. VLDB Endow., 2019

Provenance: Privacy and Security.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Data Citation: A New Provenance Challenge.
IEEE Data Eng. Bull., 2018

Realizing the potential of data science.
Commun. ACM, 2018

Data Citation: Giving Credit Where Credit is Due.
Proceedings of the 2018 International Conference on Management of Data, 2018

Discovering Similar Workflows via Provenance Clustering: A Case Study.
Proceedings of the Provenance and Annotation of Data and Processes, 2018

Automating Data Citation in CiteDB.
Proc. VLDB Endow., 2017

Generation CS: the challenges of and responses to the enrollment surge.
Inroads, 2017

Generation CS: the mixed news on diversity and the enrollment surge.
Inroads, 2017

Generation CS: the growth of computer science.
Inroads, 2017

Letter from the 2017 IEEE TCDE Impact Award Winner.
IEEE Data Eng. Bull., 2017

A Model for Fine-Grained Data Citation.
Proceedings of the 25th Italian Symposium on Advanced Database Systems, 2017

Data Citation: A Computational Challenge.
Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2017

Automating Data Citation: The eagle-i Experience.
Proceedings of the 2017 ACM/IEEE Joint Conference on Digital Libraries, 2017

Effective and efficient similarity search in scientific workflow repositories.
Future Gener. Comput. Syst., 2016

Why data citation is a computational problem.
Commun. ACM, 2016

PROX: Approximated Summarization of Data Provenance.
Proceedings of the 19th International Conference on Extending Database Technology, 2016

Answering regular path queries on workflow provenance.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

Approximated Summarization of Data Provenance.
Proceedings of the 24th ACM International Conference on Information and Knowledge Management, 2015

Managing General and Individual Knowledge in Crowd Mining Applications.
Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research, 2015

Top-k and Clustering with Noisy Comparisons.
ACM Trans. Database Syst., 2014

Ontology Assisted Crowd Mining.
Proc. VLDB Endow., 2014

Approximated Provenance for Complex Applications.
Proceedings of the 6th Workshop on the Theory and Practice of Provenance, 2014

OASSIS: query driven crowd mining.
Proceedings of the International Conference on Management of Data, 2014

Layer Decomposition: An Effective Structure-Based Approach for Scientific Workflow Similarity.
Proceedings of the 10th IEEE International Conference on e-Science, 2014

Learning to explore scientific workflow repositories.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013

Search and result presentation in scientific workflow repositories.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013

Education and career paths for data scientists.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013

A cascading mentoring pedagogy in a CS service learning course to broaden participation and perceptions.
Proceedings of the 44th ACM Technical Symposium on Computer Science Education, 2013

A propagation model for provenance views of public/private workflows.
Proceedings of the Joint 2013 EDBT/ICDT Conferences, 2013

Using the crowd for top-k and group-by queries.
Proceedings of the Joint 2013 EDBT/ICDT Conferences, 2013

Understanding Local Structure in Ranked Datasets.
Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research, 2013

To Show or Not to Show in Workflow Provenance.
Proceedings of the In Search of Elegance in the Theory and Practice of Computation, 2013

Labeling Workflow Views with Fine-Grained Dependencies.
Proc. VLDB Endow., 2012

The reflective mentor: charting undergraduates' responses to computer science service learning (abstract only).
Proceedings of the 43rd ACM technical symposium on Computer science education, 2012

Generating sound workflow views for correct provenance analysis.
ACM Trans. Database Syst., 2011

Putting Lipstick on Pig: Enabling Database-style Workflow Provenance.
Proc. VLDB Endow., 2011

A Fine-Grained Workflow Model with Provenance-Aware Security Views.
Proceedings of the 3rd Workshop on the Theory and Practice of Provenance, 2011

Labeling recursive workflow executions on-the-fly.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2011

Provenance views for module privacy.
Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2011

On provenance and privacy.
Proceedings of the Database Theory, 2011

Deriving probabilistic databases with inference ensembles.
Proceedings of the 27th International Conference on Data Engineering, 2011

Hiding Data and Structure in Workflow Provenance.
Proceedings of the Databases in Networked Information Systems - 7th International Workshop, 2011

Enabling Privacy in Provenance-Aware Workflow Systems.
Proceedings of the Fifth Biennial Conference on Innovative Data Systems Research, 2011

Keyword Search in Workflow Repositories with Access Control.
Proceedings of the 5th Alberto Mendelzon International Workshop on Foundations of Data Management, 2011

A bi-labeling based XPath processing system.
Inf. Syst., 2010

Preserving Module Privacy in Workflow Provenance
CoRR, 2010

An optimal labeling scheme for workflow provenance using skeleton labels.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

On Provenance and Privacy.
Proceedings of the Provenance and Annotation of Data and Processes, 2010

WOLVES: Achieving Correct Provenance Analysis by Detecting and Resolving Unsound Workflow Views.
Proc. VLDB Endow., 2009

PDiffView: Viewing the Difference in Provenance of Workflow Results.
Proc. VLDB Endow., 2009

Detecting and resolving unsound workflow views for correct provenance analysis.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009

On User Views in Scientific Workflow Systems.
Proceedings of the First International Workshop on the role of Semantic Web in Provenance Management (SWPM 2009), 2009

Optimizing user views for workflows.
Proceedings of the Database Theory, 2009

Differencing Provenance in Scientific Workflows.
Proceedings of the 25th International Conference on Data Engineering, 2009

Erratum to "Propagating XML constraints to relations" [JCSS 73 (2007) 316-361].
J. Comput. Syst. Sci., 2008

Special Issue: The First Provenance Challenge.
Concurr. Comput. Pract. Exp., 2008

Addressing the provenance challenge using ZOOM.
Concurr. Comput. Pract. Exp., 2008

Provenance and scientific workflows: challenges and opportunities.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

Scientific Data Management: An Orphan in the Database Community?
Proceedings of the 24th International Conference on Data Engineering, 2008

Querying and Managing Provenance through User Views in Scientific Workflows.
Proceedings of the 24th International Conference on Data Engineering, 2008

Propagating XML constraints to relations.
J. Comput. Syst. Sci., 2007

Provenance in Scientific Workflow Systems.
IEEE Data Eng. Bull., 2007

BioGuideSRS: querying multiple sources with a user-centric perspective.
Bioinform., 2007

Zoom*UserViews: Querying Relevant Provenance in Workflow Systems.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

PATAXÓ: A framework to allow updates through XML views.
ACM Trans. Database Syst., 2006

Path-based Systems to Guide Scientists in the Maze of Biological Data Sources.
J. Bioinform. Comput. Biol., 2006

Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms.
Proceedings of the 32nd International Conference on Very Large Data Bases, 2006

A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows.
Proceedings of the Provenance and Annotation of Data, 2006

An Efficient XPath Query Processor for XML Streams.
Proceedings of the 22nd International Conference on Data Engineering, 2006

Designing and Evaluating an XPath Dialect for Linguistic Queries.
Proceedings of the 22nd International Conference on Data Engineering, 2006

Towards a Model of Provenance and User Views in Scientific Workflows.
Proceedings of the Data Integration in the Life Sciences, Third International Workshop, 2006

Digital library information-technology infrastructures.
Int. J. Digit. Libr., 2005

Efficiently Supporting Structure Queries on Phylogenetic Trees.
Proceedings of the 17th International Conference on Scientific and Statistical Database Management, 2005

ViteX: A Streaming XPath Processing System.
Proceedings of the 21st International Conference on Data Engineering, 2005

A User-Centric Framework for Accessing Biological Sources and Tools.
Proceedings of the Data Integration in the Life Sciences, Second InternationalWorkshop, 2005

Active XML and Data Activation.
Proceedings of the 12th International Workshop on Abstract State Machines, 2005

Biological Data Management: Research, Practice and Opportunities.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

From XML View Updates to Relational View Updates: old solutions to a new problem.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

BLAS: An Efficient XPath Processing System.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004

EXPedite: a system for encoded XML processing.
Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, 2004

Sharing Biomedical Data with Impunity and Ease.
OMICS, 2003

Reasoning about keys for XML.
Inf. Syst., 2003

On the updatability of XML views over relational databases.
Proceedings of the International Workshop on Web and Databases, 2003

RRXF: Redundancy reducing XML storage in relations.
Proceedings of 29th International Conference on Very Large Data Bases, 2003

UXQuery: Building Updatable XML Views over Relational Databases.
Proceedings of the XVIII Simpósio Brasileiro de Bancos de Dados, 2003

Propagating XML Constraints to Relations.
Proceedings of the 19th International Conference on Data Engineering, 2003

The Information Integration System K2.
Proceedings of the Bioinformatics, 2003

In memory of Vadim Aleksandrovich Ratner.
Silico Biol., 2002

Keys for XML.
Comput. Networks, 2002

Constraints preserving schema mapping from XML to relations.
Proceedings of the Fifth International Workshop on the Web and Databases, 2002

Tale of Two Cultures: Are There Database Research Issues in Bioinformatics?
Proceedings of the 14th International Conference on Scientific and Statistical Database Management, 2002

XKvalidator: a constraint validator for XML.
Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, 2002

K2/Kleisli and GUS: Experiments in integrated access to genomic data sources.
IBM Syst. J., 2001

Constraints for XML.
Proceedings of the XVI Simpósio Brasileiro de Banco de Dados, 2001

In Memory of Chris Overton.
Silico Biol., 2000

View Maintenance for Hierarchical Semistructured Data.
Proceedings of the Data Warehousing and Knowledge Discovery, 2000

Specifying Database Transformations in WOL.
IEEE Data Eng. Bull., 1999

Specifying Updates in Biomedical Databases.
Proceedings of the 11th International Conference on Scientific and Statistical Database Management, 1999

Reasoning about Nested Functional Dependencies.
Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31, 1999

BioKleisli: A Digital Library for Biomedical Researchers.
Int. J. Digit. Libr., 1997

Adding Structure to Unstructured Data.
Proceedings of the Database Theory, 1997

WOL: A Language for Database Transformations and Constraints.
Proceedings of the Thirteenth International Conference on Data Engineering, 1997

A Query Language and Optimization Techniques for Unstructured Data.
Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, 1996

Challenges in Integrating Biological Data Sources.
J. Comput. Biol., 1995

A Data Transformation System for Biological Data Sources.
Proceedings of the VLDB'95, 1995

Semantics of Database Transformations.
Proceedings of the Semantics in Databases, 1995

Programming Constructs for Unstructured Data.
Proceedings of the Database Programming Languages (DBPL-5), 1995

Facilitating Transformations in a Human Genome Project Database.
Proceedings of the Third International Conference on Information and Knowledge Management (CIKM'94), Gaithersburg, Maryland, USA, November 29, 1994

RTC: Language Support for Real-Time Concurrency.
Real Time Syst., 1993

Deadlock Prevention in Concurrent Real-Time Systems.
Real Time Syst., 1993

Deadlock Prevention in the RTC Programming System for Distributed Real-Time Applications.
Proceedings of the 13th International Conference on Distributed Computing Systems, 1993

CCSR 92: Calculus for Communicating Shared Resources with Dynamic Priorities.
Proceedings of the NAPAW 92, 1992

Theoretical Aspects of Schema Merging.
Proceedings of the Advances in Database Technology, 1992

Timed Atomic Commitment.
IEEE Trans. Computers, 1991

A Semantics for Complex Objects and Approximate Answers.
J. Comput. Syst. Sci., 1991

Semi-Materialization: A Technnique for Optimizing Frequently Executed Queries.
Data Knowl. Eng., 1991

A Performance Analysis of Times Synchronous Communication Primitives.
IEEE Trans. Computers, 1990

Querying independent databases.
Inf. Sci., 1990

A protocol for timed atomic commitment.
Proceedings of the 9th International Conference on Distributed Computing Systems, 1989

Language constructs for timed atomic commitment.
Proceedings of the Nineteenth International Symposium on Fault-Tolerant Computing, 1989

A Semantics for Complex Objects and Approximate Queries.
Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1988

Adding Time to Synchronous Process Communications.
IEEE Trans. Computers, 1987

A Performance Comparison of Optimistic versus Conservative Strategies during Partition Failures in Distributed Databases.
J. Manag. Inf. Syst., 1987

Generalized I/O with Timing Constraints.
Proceedings of the 7th International Conference on Distributed Computing Systems, 1987

Applications of Byzantine Agreement in Database Systems.
ACM Trans. Database Syst., 1986

Protocols for Timed Synchronous Process Communications.
Proceedings of the 7th IEEE Real-Time Systems Symposium (RTSS '86), 1986

Consistency in Partitioned Networks.
ACM Comput. Surv., 1985

Optimism and Consistency In Partitioned Distributed Database Systems.
ACM Trans. Database Syst., 1984

Is Byzantine Agreement Useful in a Distributed Database?
Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, 1984
