Peter J. Haas

Orcid: 0000-0001-5694-3065

  • University of Massachusetts, Amherst, MA, USA
  • Thomas J. Watson Research Center, Yorktown Heights, USA (former)

According to our database1, Peter J. Haas authored at least 128 papers between 1985 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization.
Proc. VLDB Endow., January, 2024

Exact PPS sampling with bounded sample size.
Inf. Process. Lett., August, 2023

NIM: Generative Neural Networks for Automated Modeling and Generation of Simulation Inputs.
ACM Trans. Model. Comput. Simul., July, 2023

Efficient Hybrid Simulation Optimization via Graph Neural Network Metamodeling.
Proceedings of the Winter Simulation Conference, 2023

Causal Dynamic Bayesian Networks for Simulation Metamodeling.
Proceedings of the Winter Simulation Conference, 2023

Piloting an Interactive Ethics and Responsible Computing Learning Environment in Undergraduate CS Courses.
Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Volume 1, 2023

In-Database Decision Support: Opportunities and Challenges.
IEEE Data Eng. Bull., 2022

Predictive and Prescriptive Analytics in Business Decision Making: Needs and Concerns.
CoRR, 2022

Enhanced Simulation Metamodeling via Graph and Generative Neural Networks.
Proceedings of the Winter Simulation Conference, 2022

Augmenting Decision Making via Interactive What-If Analysis.
Proceedings of the 12th Conference on Innovative Data Systems Research, 2022

Introduction to the Special Issue for Towards an Ecosystem of Simulation Models and Data.
ACM Trans. Model. Comput. Simul., 2020

SuDocu: Summarizing Documents by Example.
Proc. VLDB Endow., 2020

sPaQLTooLs: A Stochastic Package Query Interface for Scalable Constrained Optimization.
Proc. VLDB Endow., 2020

NIM: Modeling and Generation of Simulation Inputs Via Generative Neural Networks.
Proceedings of the Winter Simulation Conference, 2020

Stochastic Package Queries in Probabilistic Databases.
Proceedings of the 2020 International Conference on Management of Data, 2020

General Temporally Biased Sampling Schemes for Online Model Management.
ACM Trans. Database Syst., 2019

Online Model Management via Temporally Biased Sampling.
SIGMOD Rec., 2019

Temporally-Biased Sampling Schemes for Online Model Management.
CoRR, 2019

Compressed linear algebra for declarative large-scale machine learning.
Commun. ACM, 2019

MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions.
Proceedings of the 2019 International Conference on Management of Data, 2019

NIM: generative neural networks for modeling and generation of simulation inputs.
Proceedings of the 2019 Summer Simulation Conference, 2019

Monte Carlo Methods for Uncertain Data.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Karp-Luby Sampling.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

Unknown Examples & Machine Learning Model Generalization.
CoRR, 2018

Temporally-Biased Sampling for Online Model Management.
Proceedings of the 21st International Conference on Extending Database Technology, 2018

Scaling Machine Learning via Compressed Linear Algebra.
SIGMOD Rec., 2017

Foresight: Recommending Visual Insights.
Proc. VLDB Endow., 2017

Foresight: Rapid Data Exploration Through Guideposts.
CoRR, 2017

Sampling for Scalable Visual Analytics.
IEEE Computer Graphics and Applications, 2017

Compressed Linear Algebra for Large-Scale Machine Learning.
Proc. VLDB Endow., 2016

Data-Stream Sampling: Basic Techniques and Results.
Proceedings of the Data Stream Management - Processing High-Speed Data Streams, 2016

On Transience and Recurrence in Irreducible Finite-State Stochastic Systems.
ACM Trans. Model. Comput. Simul., 2015

Guest Editors' Introduction to Special Issue Honoring Donald L. Iglehart.
ACM Trans. Model. Comput. Simul., 2015

Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion.
Knowl. Inf. Syst., 2015

Dynamic interaction graphs with probabilistic edge decay.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

Groupwise analytics via adaptive MapReduce.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

Guest editors' introduction to special issue on the third INFORMS simulation society research workshop.
ACM Trans. Model. Comput. Simul., 2014

Improving the efficiency of stochastic composite simulation models via result caching.
Proceedings of the 2014 Winter Simulation Conference, 2014

Model-data Ecosystems: challenges, tools, and trends.
Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2014

Automated hypothesis generation based on mining scientific literature.
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014

Non-uniformity issues and workarounds in bounded-size sampling.
VLDB J., 2013

Panel: Are we effectively preparing our students to be certified analytics professionals?
Proceedings of the Winter Simulations Conference: Simulation Making Decisions in a Complex World, 2013

Simulation of database-valued markov chains using SimSQL.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Eagle-eyed elephant: split-oriented indexing in Hadoop.
Proceedings of the Joint 2013 EDBT/ICDT Conferences, 2013

Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches.
Found. Trends Databases, 2012

On simulation of non-Markovian stochastic Petri nets with heavy-tailed firing times.
Proceedings of the Winter Simulation Conference, 2012

Splash: a platform for analysis and simulation of health.
Proceedings of the ACM International Health Informatics Symposium, 2012

Topic Models over Spoken Language.
Proceedings of the 12th IEEE International Conference on Data Mining, 2012

Splash: Simulation optimization in complex systems of systems.
Proceedings of the 50th Annual Allerton Conference on Communication, 2012

The monte carlo database system: Stochastic analysis close to the data.
ACM Trans. Database Syst., 2011

Data is Dead... Without What-If Models.
Proc. VLDB Endow., 2011

Information technology for healthcare transformation.
IBM J. Res. Dev., 2011

Sketches get sketchier.
Commun. ACM, 2011

Large-scale matrix factorization with distributed stochastic gradient descent.
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011

MCDB-R: Risk Analysis in the Database.
Proc. VLDB Endow., 2010

Ricardo: integrating R and Hadoop.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

Social Factors in Creating an Integrated Capability for Health System Modeling and Simulation.
Proceedings of the Advances in Social Computing, 2010

From MUD to MIRE: Managing Inherent Risk in the Enterprise.
Proceedings of the Fourth International VLDB workshop on Management of Uncertain Data (MUD 2010) in conjunction with VLDB 2010, 2010

Special issue on uncertain and probabilistic databases.
VLDB J., 2009

Discovering and Exploiting Statistical Properties for Query Optimization in Relational Databases: A Survey.
Stat. Anal. Data Min., 2009

Distinct-value synopses for multiset operations.
Commun. ACM, 2009

E = MC<sup>3</sup>: managing uncertain enterprise data in a cluster-computing environment.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009

Uncertainty management in rule-based information extraction systems.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009

Resolution-Aware Query Answering for Business Intelligence.
Proceedings of the 25th International Conference on Data Engineering, 2009

Maintaining bounded-size sample synopses of evolving datasets.
VLDB J., 2008

Main-memory scan sharing for multi-core CPUs.
Proc. VLDB Endow., 2008

MCDB: a monte carlo approach to managing uncertain data.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

08421 Working Group: Classification, Representation and Modeling.
Proceedings of the Uncertainty Management in Information Systems, 12.10. - 17.10.2008, 2008

08421 Working Group: Report of the Probabilistic Databases Benchmarking.
Proceedings of the Uncertainty Management in Information Systems, 12.10. - 17.10.2008, 2008

Consistent selectivity estimation via maximum entropy.
VLDB J., 2007

On reservoir sampling with deletions.
Monde des Util. Anal. Données, 2007

Detecting Attribute Dependencies from Query Feedback.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

On synopses for distinct-value estimation under multiset operations.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2007

Maintaining bernoulli samples over evolving multisets.
Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2007

Integrating Query-Feedback Based Statistics into Informix Dynamic Server.
Proceedings of the Datenbanksysteme in Business, 2007

Making DB2Products Self-Managing: Strategies and Experiences.
IEEE Data Eng. Bull., 2006

GORDIAN: Efficient and Scalable Discovery of Composite Keys.
Proceedings of the 32nd International Conference on Very Large Data Bases, 2006

A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets.
Proceedings of the 32nd International Conference on Very Large Data Bases, 2006

MAXENT: consistent cardinality estimation in action.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2006

ISOMER: Consistent Histogram Construction Using Query Feedback.
Proceedings of the 22nd International Conference on Data Engineering, 2006

Techniques for Warehousing of Sample Data.
Proceedings of the 22nd International Conference on Data Engineering, 2006

Integrating a Maximum-Entropy Cardinality Estimator into DB2 UDB.
Proceedings of the Advances in Database Technology, 2006

Statistical Learning Techniques for Costing XML Queries.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

Consistently Estimating the Selectivity of Conjuncts of Predicates.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

Automated statistics collection in action.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005

Toward Automated Large-Scale Information Integration and Discovery.
Proceedings of the Data Management in a Connected World, 2005

Stochastic Petri Nets for Modelling and Simulation.
Proceedings of the 36th conference on Winter simulation, 2004

CORDS: Automatic Generation of Correlation Statistics in DB2.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

Automated Statistics Collection in DB2 UDB.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004

A Bi-Level Bernoulli Scheme for Database Sampling.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2004

Automatic Relationship Discovery in Self-Managing Database Systems.
Proceedings of the 1st International Conference on Autonomic Computing (ICAC 2004), 2004

Watermarking relational data: framework, algorithms and analysis.
VLDB J., 2003

BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data.
Proceedings of 29th International Conference on Very Large Data Bases, 2003

A System for Watermarking Relational Databases.
Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, 2003

Efficient data reduction with EASE.
Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24, 2003

On the validity of long-run estimation methods for discrete-event systems.
SIGMETRICS Perform. Evaluation Rev., 2002

A scalable hash ripple join algorithm.
Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002

A new two-phase sampling based algorithm for discovering association rules.
Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002

FAST: A New Sampling-Based Algorithm for Discovering Association Rules.
Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26, 2002

Stochastic Petri nets - modelling, stability, simulation.
Springer series in operations research, Springer, ISBN: 978-0-387-95445-5, 2002

Estimation of delays in non-regenerative discrete-event stochastic systems.
SIGMETRICS Perform. Evaluation Rev., 2001

Online Query Processing.
Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001

Estimation Methods for Nonregenerative Stochastic Petri Nets.
IEEE Trans. Software Eng., 1999

Interactive data Analysis: The Control Project.
Computer, 1999

Techniques for Online Exploration of Large Object-Relational Datasets.
Proceedings of the 11th International Conference on Scientific and Statistical Database Management, 1999

Ripple Joins for Online Aggregation.
Proceedings of the SIGMOD 1999, 1999

The New Jersey Data Reduction Report.
IEEE Data Eng. Bull., 1997

Large-Sample and Deterministic Confidence Intervals for Online Aggregation.
Proceedings of the Ninth International Conference on Scientific and Statistical Database Management, 1997

Online Aggregation.
Proceedings of the SIGMOD 1997, 1997

Estimation methods for stochastic Petri nets based on standardized time series.
Proceedings of the Seventh International Workshop on Petri Nets and Performance Models, 1997

Selectivity and Cost Estimation for Joins Based on Random Sampling.
J. Comput. Syst. Sci., 1996

Estimation methods for passage times using one-dependent cycles.
Discret. Event Dyn. Syst., 1996

Improved Histograms for Selectivity Estimation of Range Predicates.
Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, 1996

Perspectives of collaborative supercomputing and networking in European Aerospace research and industry.
Future Gener. Comput. Syst., 1995

Sampling-Based Estimation of the Number of Distinct Values of an Attribute.
Proceedings of the VLDB'95, 1995

One-dependent cycles and passage times in stochastic Petri nets.
Proceedings of the Sixth International Workshop on Petri Nets and Performance Models, 1995

Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics.
Proceedings of the Eleventh International Conference on Data Engineering, 1995

On the Relative Cost of Sampling for Join Selectivity Estimation.
Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1994

Networking Issues in PAGEIN: The "N" of "HPCN".
Proceedings of the High-Performance Computing and Networking, 1994

Fixed-Precision Estimation of Join Selectivity.
Proceedings of the Twelfth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1993

Sequential Sampling Procedures for Query Size Estimation.
Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, 1992

Stochastic Petri Net Representation of Discrete Event Simulations.
IEEE Trans. Software Eng., 1989

Stochastic Petri Nets with Simultaneous Transition Firings.
Proceedings of the Second International Workshop on Petri Nets and Performance Models, 1987

Regenerative Stochastic Petri Nets.
Perform. Evaluation, 1986

Regenerative Simulation Methods for Local Area Computer Networks.
IBM J. Res. Dev., 1985

Regenerative Simulation of Stochastic Petri Nets.
Proceedings of the International Workshop on Timed Petri Nets, 1985
