Robert L. Grossman

Orcid: 0000-0003-3741-5739

According to our database1, Robert L. Grossman authored at least 143 papers between 1989 and 2024.

Collaborative distances:


ACM Fellow

ACM Fellow 2016, "For contributions to data science, data intensive computing and data mining".




In proceedings 
PhD thesis 


Online presence:



LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation.
CoRR, 2024

Ten Pillars for Data Meshes.
CoRR, 2024

An Annotated Glossary for Data Commons, Data Meshes, and Other Data Platforms.
CoRR, 2024

Enhancing Instance-Level Image Classification with Set-Level Labels.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Building a collaborative cloud platform to accelerate heart, lung, blood, and sleep research.
J. Am. Medical Informatics Assoc., June, 2023

Towards self-describing and FAIR bulk formats for biomedical data.
PLoS Comput. Biol., March, 2023

Transfer Learning for Mortality Prediction in Non-Small Cell Lung Cancer with Low-Resolution Histopathology Slide Snapshots.
Proceedings of the MEDINFO 2023 - The Future Is Accessible, 2023

Scalable Batch-Mode Deep Bayesian Active Learning via Equivalence Class Annealing.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

CNT: Semi-Automatic Translation from CWL to Nextflow for Genomic Workflows.
Proceedings of the 23rd IEEE International Conference on Bioinformatics and Bioengineering, 2023

The Biomedical Research Hub: a federated platform for patient research data.
J. Am. Medical Informatics Assoc., 2022

Ten Lessons for Data Sharing With a Data Commons.
CoRR, 2022

A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments.
CoRR, 2022

BALanCe: Deep Bayesian Active Learning via Equivalence Class Annealing.
CoRR, 2021

Experiences in Managing the Performance and Reliability of a Large-Scale Genomics Cloud Platform.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

The Veterans Affairs Precision Oncology Data Repository, a Clinical, Genomic, and Imaging Research Database.
Patterns, 2020

HeartBioPortal2.0: new developments and updates for genetic ancestry and cardiometabolic quantitative traits in diverse human populations.
Database J. Biol. Databases Curation, 2020

Evaluating and interpreting caption prediction for histopathology images.
Proceedings of the Machine Learning for Healthcare Conference, 2020

Evaluation of Hyperbolic Attention in Histopathology Images.
Proceedings of the 20th IEEE International Conference on Bioinformatics and Bioengineering, 2020

Machine Learning Methods to Predict Lung Cancer Survival Using the Veterans Affairs Research Precision Oncology Data Commons.
Proceedings of the MEDINFO 2019: Health and Wellbeing e-Networks for All, 2019

The medical science DMZ: a network design pattern for data-intensive medical science.
J. Am. Medical Informatics Assoc., 2018

A framework for evaluating the analytic maturity of an organization.
Int. J. Inf. Manag., 2018

Data Lakes, Clouds and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data.
CoRR, 2018

The Matsu Wheel: a reanalysis framework for Earth satellite imagery in data commons.
Int. J. Data Sci. Anal., 2017

Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping.
Big Data, 2017

Designing and deploying a bioinformatics software-defined network exchange (SDX): Architecture, services, capabilities, and foundation technologies.
Proceedings of the 20th Conference on Innovations in Clouds, Internet and Networks, 2017

The Medical Science DMZ.
J. Am. Medical Informatics Assoc., 2016

A Case for Data Commons: Toward Data Science as a Service.
Comput. Sci. Eng., 2016

The Matsu Wheel: A Cloud-based Framework for Efficient Analysis and Reanalysis of Earth Satellite Imagery.
CoRR, 2016

A Case for Data Commons: Towards Data Science as a Service.
CoRR, 2016

Deploying Analytics with the Portable Format for Analytics (PFA).
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

The Matsu Wheel: A Cloud-Based Framework for Efficient Analysis and Reanalysis of Earth Satellite Imagery.
Proceedings of the Second IEEE International Conference on Big Data Computing Service and Applications, 2016

Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.
J. Am. Medical Informatics Assoc., 2014

Use of the Earth Observing One (EO-1) Satellite for the Namibia SensorWeb Flood Early Warning Pilot.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2013

OpenFlow Enabled Hadoop over Local and Wide Area Clusters.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

The Design of a Community Science Cloud: The Open Science Data Cloud Perspective.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

The Namibia Early Flood Warning System, a CEOS pilot project.
Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium, 2012

Discovering geometric patterns in genomic data.
Proceedings of the ACM International Conference on Bioinformatics, 2012

Toward Efficient and Simplified Distributed Data Intensive Computing.
IEEE Trans. Parallel Distributed Syst., 2011

Sector: A high performance wide area community data storage and sharing system.
Future Gener. Comput. Syst., 2010

Processing massive sized graphs using Sector/Sphere.
Proceedings of the 3rd Workshop on Many-Task Computing on Grids and Supercomputers, 2010

Malstone: towards a benchmark for analytics on large data clouds.
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010

dMaximalCliques: A Distributed Algorithm for Enumerating All Maximal Cliques and Maximal Clique Distribution.
Proceedings of the ICDMW 2010, 2010

dSimpleGraph: A Novel Distributed Clustering Algorithm for Exploring Very Large Scale Unknown Data Sets.
Proceedings of the ICDMW 2010, 2010

An overview of the Open Science Data Cloud.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

What is analytic infrastructure and why should you care?
SIGKDD Explor., 2009

Open source analytics: an introduction to the special issue.
SIGKDD Explor., 2009

The Case for Cloud Computing.
IT Prof., 2009

Compute and storage clouds using wide area high performance networks.
Future Gener. Comput. Syst., 2009

On the Varieties of Clouds for Data Intensive Computing.
IEEE Data Eng. Bull., 2009

The Open Cloud Testbed: A Wide Area Testbed for Cloud Computing Utilizing High Performance Network Services
CoRR, 2009

Flynet: a genomic resource for <i>Drosophila melanogaster</i> transcriptional regulatory networks.
Bioinform., 2009

Lessons learned from a year's worth of benchmarks of large data clouds.
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, 2009

Why Naive Ensembles Do Not Work in Cloud Computing.
Proceedings of the ICDM Workshops 2009, 2009

The Open Cloud Testbed: Supporting Open Source Cloud Computing Systems Based on Large Scale High Performance, Dynamic Network Services.
Proceedings of the Networks for Grid Applications, 2009

Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data
CoRR, 2008

Exploring data parallelism and locality in wide area networks.
Proceedings of the 2008 Workshop on Many-Task Computing on Grids and Supercomputers, 2008

Data mining using high performance data clouds: experimental studies using sector and sphere.
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008

UDTv4: Improvements in Performance and Usability.
Proceedings of the Networks for Grid Applications, Second International Conference, 2008

Discovering Emergent Behavior from Network Packet Data.
Proceedings of the Next Generation of Data Mining., 2008

UDT: UDP-based data transfer for high-speed wide area networks.
Comput. Networks, 2007

Detecting changes in large data sets of payment card data: a case study.
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007

An Alert Management Approach To Data Quality: Lessons Learned From The Visa Data Authority Program.
Proceedings of the 12th International Conference on Information Quality, 2007

Outlier Detection with Streaming Dyadic Decomposition.
Proceedings of the Advances in Data Mining. Theoretical Aspects and Applications, 2007

A peer-to-peer infrastructure for distributing large scientific data sets over wide area high-performance networks: experimental studies using wide area layer 2 services.
Proceedings of the 1st International ICST Conference on Networks for Grid Applications, 2007

An Algorithm for Assigning Unique Keys to Metabolic Pathways.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2007

High-Dimensional Visual Analytics: Interactive Exploration Guided by Pairwise Views of Point Distributions.
IEEE Trans. Vis. Comput. Graph., 2006

KDD workshop on data mining standards, services & platforms (DM-SSP) 2006.
SIGKDD Explor., 2006

Data mining middleware for wide-area high-performance networks.
Future Gener. Comput. Syst., 2006

Bandwidth challenge - Transporting sloan digital sky survey data using SECTOR.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Monitoring Data Quality for Very High Volume Transaction Systems.
Proceedings of the 11th International Conference on Information Quality, 2006

A Service Oriented Architecture Supporting Data Interoperability for Payments Card Processing Systems.
Proceedings of the Service-Oriented Computing, 2006

Distributing the Sloan Digital Sky Survey Using UDT and Sector.
Proceedings of the Second International Conference on e-Science and Grid Technologies (e-Science 2006), 2006

Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases.
Proceedings of the Data Integration in the Life Sciences, Third International Workshop, 2006

SDCS: Simplified Data Communications in Parallel/Distributed Applications.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

Simple Available Bandwidth Utilization Library for High-Speed Wide Area Networks.
J. Supercomput., 2005

Teraflows over Gigabit WANs with UDT.
Future Gener. Comput. Syst., 2005

Differential algebra structures on families of trees.
Adv. Appl. Math., 2005

Visual browsing of remote and distributed data.
Proceedings of the Visualization and Data Analysis 2005, 2005

Supporting Configurable Congestion Control in Data Transport Services.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Real Time Change Detection and Alerts from Highway Traffic Data.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

An event based framework for improving information quality that integrates baseline models, causal models and formal reference models.
Proceedings of the IQIS 2005, 2005

A Methodology for Establishing Information Quality Baselines for Complex, Distributed Systems.
Proceedings of the 2005 International Conference on Information Quality (MIT ICIQ Conference), 2005

Graph-Theoretic Scagnostics.
Proceedings of the IEEE Symposium on Information Visualization (InfoVis 2005), 2005

Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples.
Proceedings of the Data Integration in the Life Sciences, Second InternationalWorkshop, 2005

Data mining standards, services, and platforms 2004 (DM-SSP 2004).
SIGKDD Explor., 2004

An Empirical Study of the Universal Chemical Key Algorithm for Assigning Unique Keys to Chemical Compounds.
J. Bioinform. Comput. Biol., 2004

Mining Web Pages for Data Records.
IEEE Intell. Syst., 2004

Experimental studies of data transport and data access of earth-science data over networks with high bandwidth delay products.
Comput. Networks, 2004

GenIc: A Single-Pass Generalized Incremental Algorithm for Clustering.
Proceedings of the Fourth SIAM International Conference on Data Mining, 2004

Experiences in Design and Implementation of a High Performance Transport Protocol.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Using DataSpace Archives to Support Long-Term Stewardship of Remote and Distributed Data.
Proceedings of the 21st IEEE Conference on Mass Storage Systems and Technologies / 12th NASA Goddard Conference on Mass Storage Systems and Technologies, 2004

Experimental Studies Using Median Polish Procedure to Reduce Alarm Rates in Data Cubes of Intrusion Data.
Proceedings of the Intelligence and Security Informatics, 2004

A Greedy Algorithm for Selecting Models in Ensembles.
Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 2004

KDD-2003 workshop on data mining standards, services and platforms (DM-SSP 03).
SIGKDD Explor., 2003

Data webs for earth science data.
Parallel Comput., 2003

SABUL: A Transport Protocol for Grid Computing.
J. Grid Comput., 2003

TeraScope: distributed visual data mining of terascale data sets over photonic networks.
Future Gener. Comput. Syst., 2003

The Photonic TeraStream: enabling next generation applications through intelligent optical networking at iGRID2002.
Future Gener. Comput. Syst., 2003

Experimental studies using photonic data services at IGrid 2002.
Future Gener. Comput. Syst., 2003

Data integration in a bandwidth-rich world.
Commun. ACM, 2003

Transport protocols for high performance.
Commun. ACM, 2003

A Case for the Global Access to Large Distributed Data Sets Using Data Webs Employing Photonic Data Services.
Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003

Mining data records in Web pages.
Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24, 2003

Experimental Studies of the Universal Chemical Key (UCK)Algorithm on the NCI Database of Chemical Compounds.
Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference, 2003

DataSpace: a data Web for the exploratory analysis and mining of data.
Comput. Sci. Eng., 2002

Data mining standards initiatives.
Commun. ACM, 2002

Merging multiple data streams on common keys over high performance networks.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

An Algebraic Approach to Data Mining: Some Examples.
Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 2002

PSockets: The Case for Application-level Network Striping for Data Intensive Applications using High Speed Wide Area Networks.
Proceedings of the Proceedings Supercomputing 2000, 2000

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture.
Proceedings of the Advances in Database Technology, 2000

The management and mining of multiple predictive models using the predictive modeling markup language.
Inf. Softw. Technol., 1999

Data Min. Knowl. Discov., 1999

Papyrus: A System for Data Mining over Local and Wide Area Clusters and Super-Clusters.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

A High Performance Implementation of the Data Space Transfer Protocol (DSTP).
Proceedings of the Large-Scale Parallel Data Mining, 1999

A Methodology for Supporting Collaborative Exploratory Analysis of Massive Data Sets in Tele-Immersive Environments.
Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, 1999

A Tutorial Introduction to High Performance Data Mining (Abstract).
Proceedings of the Principles of Data Mining and Knowledge Discovery, 1997

Database Mining Challenges for Digital Libraries.
ACM Comput. Surv., 1996

Data Mining Using Light Weight Object Management in Clustered Computing Environments.
Proceedings of the 7th Workshop on Persistent Object Systems, 1996

Data Mining and Tree-Based Optimization.
Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996

Optimization driven data mining and credit scoring.
Proceedings of the IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering, 1996

An Algebraic Approach to Hybrid Systems.
Theor. Comput. Sci., 1995

Clusters, meta-clusters, and digital libraries: digital libraries for scientific, engineering and medical applications.
SIGWEB Newsl., 1995

PTool: A Light Weight Persistent Object Manager.
Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, 1995

An Architecture for a Scalable, High-Performance Digital Library.
Proceedings of the Fourteenth IEEE Symposium on Mass Storage Systems, 1995

Caching and Migration for Multilevel Persistent Object Stores.
Proceedings of the Fourteenth IEEE Symposium on Mass Storage Systems, 1995

The Symbolic Computation of Differential Invariants of Polynomial Vector Field Systems Using Trees.
Proceedings of the 1995 International Symposium on Symbolic and Algebraic Computation, 1995

A Data Intensive Computing Approach to Path Planning and Mode Management for Hybrid Systems.
Proceedings of the Hybrid Systems III: Verification and Control, 1995

Lightweight video service for multi-media digital libraries.
Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research, 1995

Visibility with a Moving Point of View.
Algorithmica, 1994

Analyzing High Energy Physics Data Using Databases: A Case Study.
Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management, 1994

Ptool: A Scalable Persistent Object Manager.
Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, 1994

Managing Physical Folios of Objects Between Nodes.
Proceedings of the Persistent Object Systems, 1994

Hybrid Systems and Quantum Automata: Preliminary Announcement.
Proceedings of the Hybrid Systems II, 1994

Wavelet transforms associated with finite cyclic groups.
IEEE Trans. Inf. Theory, 1993

Requirements for a system to analyze high energy physics events using database computing.
Proceedings of the Twelfth IEEE Symposium on Mass Storage Systems, 1993

A proof-of-concept implementation interfacing an object manager with a hierarchical storage system.
Proceedings of the Twelfth IEEE Symposium on Mass Storage Systems, 1993

Panel: Scientific Databases.
Proceedings of the Foundations of Data Organization and Algorithms, 1993

Symbolic Computation of Derivations Using Labeled Trees.
J. Symb. Comput., 1992

The Explicit Computation of Integration Algorithms and First Integrals for Ordinary Differential Equations with Polynomial Coefficients Using Trees.
Proceedings of the 1992 International Symposium on Symbolic and Algebraic Computation, 1992

Proceedings of the Hybrid Systems, 1992

Some Remarks About Flows in Hybrid Systems.
Proceedings of the Hybrid Systems, 1992

Computations Involving Differential Operators and Their Actions on Functions.
Proceedings of the 1991 International Symposium on Symbolic and Algebraic Computation, 1991

Labeled Trees and the Efficient Computation of Derivations.
Proceedings of the ACM-SIGSAM 1989 International Symposium on Symbolic and Algebraic Computation, 1989
