Marc Snir

Orcid: 0000-0002-3504-2468

According to our database1, Marc Snir authored at least 171 papers between 1977 and 2024.

Collaborative distances:
  • Dijkstra number2 of three.
  • Erdős number3 of two.


ACM Fellow

ACM Fellow 1999, "For contributions to the theory of parallel computation and the development of scaleable parallel systems architectures.".

IEEE Fellow

IEEE Fellow 1996, "For technical leadership in the development of parallel computation and scalable parallel systems architectures.".



In proceedings 
PhD thesis 


Online presence:



Formal Definitions and Performance Comparison of Consistency Models for Parallel File Systems.
IEEE Trans. Parallel Distributed Syst., June, 2024

Holistic Performance Analysis for Asynchronous Many-Task Runtimes.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Exploring the Efficiency of Renewable Energy-based Modular Data Centers at Scale.
Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

HPX+LCI PAW-ATM23 Artifact Archive.
Dataset, September, 2023

Near-Lossless MPI Tracing and Proxy Application Autogeneration.
IEEE Trans. Parallel Distributed Syst., 2023

Design and Analysis of the Network Software Stack of an Asynchronous Many-task System - The LCI parcelport of HPX.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

Pinpointing crash-consistency bugs in the HPC I/O stack: a cross-layer approach.
Proceedings of the International Conference for High Performance Computing, 2021

Pilgrim: scalable and (near) lossless MPI tracing.
Proceedings of the International Conference for High Performance Computing, 2021

Verifying IO Synchronization from MPI Traces.
Proceedings of the 6th IEEE/ACM International Parallel Data Systems Workshop, 2021

File System Semantics Requirements of HPC Applications.
Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

Recorder 2.0: Efficient Parallel I/O Tracing and Analysis.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

First IEEE International Workshop on High-Performance Storage (HPS).
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Understanding and Finding Crash-Consistency Bugs in Parallel File Systems.
Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems, 2020

Exploring Properties and Correlations of Fatal Events in a Large-Scale HPC System.
IEEE Trans. Parallel Distributed Syst., 2019

Optimizing I/O Performance of HPC Applications with Autotuning.
ACM Trans. Parallel Comput., 2019

Automatic generation of benchmarks for I/O-intensive parallel applications.
J. Parallel Distributed Comput., 2019

Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications.
Int. J. Parallel Program., 2019

Exploring the feasibility of lossy compression for PDE simulations.
Int. J. High Perform. Comput. Appl., 2019

Channel and filter parallelism for large-scale CNN training.
Proceedings of the International Conference for High Performance Computing, 2019

ScaDL 2019 Keynote Talk.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Characterizing and Understanding HPC Job Failures Over The 2K-Day Life of IBM BlueGene/Q System.
Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019

Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

Argobots: A Lightweight Low-Level Threading and Tasking Framework.
IEEE Trans. Parallel Distributed Syst., 2018

Technical perspective: The future of MPI.
Commun. ACM, 2018

Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

A Lightweight Communication Runtime for Distributed Graph Analytics.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

FULT: Fast User-Level Thread Scheduling Using Bit-Vectors.
Proceedings of the 47th International Conference on Parallel Processing, 2018

The Future of Supercomputing.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Neural Network Based Silent Error Detector.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Eliminating contention bottlenecks in multithreaded MPI.
Parallel Comput., 2017

Predicting HPC parallel program performance based on LLVM compiler.
Clust. Comput., 2017

The informal guide to ACM fellow nominations.
Commun. ACM, 2017

Towards a More Complete Understanding of SDC Propagation.
Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

LogAider: A tool for mining potential correlations of HPC log events.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations.
ACM Trans. Parallel Comput., 2016

Doing Moore with Less - Leapfrogging Moore's Law with Inexactness for Supercomputing.
CoRR, 2016

Overcoming the power wall by exploiting inexactness and emerging COTS architectural features: Trading precision for improving application quality.
Proceedings of the 29th IEEE International System-on-Chip Conference, 2016

Towards millions of communicating threads.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Reducing Waste in Extreme Scale Systems through Introspective Analysis.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters.
IEEE Trans. Parallel Distributed Syst., 2015

Towards a more fault resilient multigrid solver.
Proceedings of the Symposium on High Performance Computing, 2015

PPL: an abstract runtime system for hybrid parallel programming.
Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, 2015

Pattern-driven parallel I/O tuning.
Proceedings of the 10th Parallel Data Storage Workshop, 2015

Scheduling the I/O of HPC Applications Under Congestion.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

A General Space-filling Curve Algorithm for Partitioning 2D Meshes.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Distributed Monitoring and Management of Exascale Systems in the Argo Project.
Proceedings of the Distributed Applications and Interoperable Systems, 2015

Understanding the Propagation of Error Due to a Silent Data Corruption in a Sparse Matrix Vector Multiply.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Dynamic Model-Driven Parallel I/O Performance Tuning.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Toward Exascale Resilience: 2014 update.
Supercomput. Front. Innov., 2014

Addressing failures in exascale computing.
Int. J. High Perform. Comput. Appl., 2014

Enabling communication concurrency through flexible MPI endpoints.
Int. J. High Perform. Comput. Appl., 2014

Improved MPI collectives for MPI processes in shared address spaces.
Clust. Comput., 2014

Automatic generation of I/O kernels for HPC applications.
Proceedings of the 9th Parallel Data Storage Workshop, 2014

Improving parallel I/O autotuning with performance modeling.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

FlipIt: An LLVM Based Fault Injector for HPC.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Failure prediction for HPC systems and applications: Current situation and open issues.
Int. J. High Perform. Comput. Appl., 2013

Programming for Exascale Computers.
Comput. Sci. Eng., 2013

Software Abstractions and Methodologies for HPC Simulation Codes on Future Architectures.
CoRR, 2013

Taming parallel I/O complexity with auto-tuning.
Proceedings of the International Conference for High Performance Computing, 2013

Enabling MPI interoperability through flexible communication endpoints.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Programming models for extreme-scale computing.
Proceedings of the ACM Symposium on Principles of Distributed Computing, 2013

NUMA-aware shared-memory collective communication for MPI.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Programming Models for High-Performance Computing.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

Fault prediction under the microscope: a closer look into HPC systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Automatic datatype generation and optimization.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Reduce and Scan.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Distributed-Memory Multiprocessor.
Proceedings of the Encyclopedia of Parallel Computing, 2011

The International Exascale Software Project roadmap.
Int. J. High Perform. Comput. Appl., 2011

Computer and information science and engineering: one discipline, many specialties.
Commun. ACM, 2011

Optimizing the Barnes-Hut algorithm in UPC.
Proceedings of the Conference on High Performance Computing Networking, 2011

Performance modeling for systematic performance tuning.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Transformation for class immutability.
Proceedings of the 33rd International Conference on Software Engineering, 2011

Generic topology mapping strategies for large-scale parallel architectures.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Comparing archival policies for Blue Waters.
Proceedings of the 18th International Conference on High Performance Computing, 2011

Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford.
IEEE Micro, 2010

Advice to members seeking ACM distinction.
Commun. ACM, 2010

On Communication Determinism in Parallel HPC Applications.
Proceedings of the 19th International Conference on Computer Communications and Networks, 2010

On the Need for a Consortium of Capability Centers.
Int. J. High Perform. Comput. Appl., 2009

Toward Exascale Resilience.
Int. J. High Perform. Comput. Appl., 2009

Universal parallel computing research center at Illinois.
Proceedings of the 2009 IEEE Hot Chips 21 Symposium (HCS), 2009

ESoftCheck: Removal of Non-vital Checks for Fault Tolerance.
Proceedings of the CGO 2009, 2009

Efficient software checking for fault tolerance.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Techniques for Efficient Software Checking.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Programming Patterns for Architecture-Level Software Optimizations on Frequent Pattern Mining.
Proceedings of the 23rd International Conference on Data Engineering, 2007

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

A Note on N-Body Computations with Cutoffs.
Theory Comput. Syst., 2004

A Framework for Measuring Supercomputer Productivity.
Int. J. High Perform. Comput. Appl., 2004

Best Papers from the 2002 International Parallel and Distributed Processing Symposium.
J. Parallel Distributed Comput., 2003

Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer.
Int. J. Parallel Program., 2002

Generalized Communicators in the Message Passing Interface.
IEEE Trans. Parallel Distributed Syst., 2001

What Are the Top Ten Most Influential Parallel and Distributed Processing Concepts of the Past Millenium?
J. Parallel Distributed Comput., 2001

Blue Gene: A vision for protein science using a petaflop supercomputer.
IBM Syst. J., 2001

Demonstrating the scalability of a molecular dynamics application on a Petaflop computer.
Proceedings of the 15th international conference on Supercomputing, 2001

Java programming for high-performance numerical computing.
IBM Syst. J., 2000

From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems.
Proceedings of the Proceedings Supercomputing 2000, 2000

SP2 System Architecture.
IBM Syst. J., 1999

Optimizing Array Reference Checking in Java Programs.
IBM Syst. J., 1998

The NYU Ultracomputer - Designing a MIMD, Shared-Memory Parallel Machine.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

PRISM: An Integrated Architecture for Scalable Shared Memory.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Message Proxies for Efficient, Protected Communication on SMP Clusters.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

Randomized Routing with Shorter Paths.
IEEE Trans. Parallel Distributed Syst., 1996

A Message Passing Standard for MPP and Workstations.
Commun. ACM, 1996

For a Massive Number of Massively Parallel Machines: What are the Target Applications, Who are the Target Users, and What New R&D is Needed to Hit the Target?
Proceedings of IPPS '96, 1996

MPI-2: Extending the Message-Passing Interface.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

Overview of the MPI-IO Parallel I/O Interface.
Proceedings of the Input/Output in Parallel and Distributed Computer Systems., 1996

CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers.
IEEE Trans. Parallel Distributed Syst., 1995

Parallel I/O: Getting ready for prime time.
IEEE Concurr., 1995

The Communication Software and Parallel Environment of the IBM SP2.
IBM Syst. J., 1995

Parallel File Systems for the IBM SP Computers.
IBM Syst. J., 1995

MPI Programming Environment for IBM SP1/SP2.
Proceedings of the 15th International Conference on Distributed Computing Systems, Vancouver, British Columbia, Canada, May 30, 1995

Calling Names on Nameless Networks
Inf. Comput., August, 1994

The IBM External User Interface for Scalable Parallel Systems.
Parallel Comput., 1994

Memory versus randomization in on-line algorithms.
IBM J. Res. Dev., 1994

MPI-F: An Efficient Implementation of MPI on IBM-SP1.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

Random Walks on Weighted Graphs and Applications to On-line Algorithms.
J. ACM, 1993

Scalable Parallel Computing: The IBM 9076 Scalable POWERparallel 1.
Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, 1993

Designing Efficient, Scalable, and Portable Collective Communication Libraries.
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993

Computer Architectures and Programming Models for Scalable Parallel Computing.
Proceedings of the Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1993

Issues and Directions in Scalable Parallel Computing.
Proceedings of the Twelth Annual ACM Symposium on Principles of Distributed Computing, 1993

Using Visualization Tools to Understand Concurrency.
IEEE Softw., 1992

Cost-Performance Tradeoffs for Interconnection Networks.
Discret. Appl. Math., 1992

Scalable Parallel Computers and Scalable Parallel Codes: From Theory to Practice.
Proceedings of the Parallel Architectures and Their Efficient Use, 1992

Size-depth Trade-Offs for Monotone Arithmetic Circuits.
Theor. Comput. Sci., 1991

Better Computing on the Anonymous Ring.
J. Algorithms, 1991

A Complexity Theory of Efficient Parallel Algorithms.
Theor. Comput. Sci., 1990

Communication Complexity of PRAMs.
Theor. Comput. Sci., 1990

Efficient Parallel Algorithms for Graph Problems.
Algorithmica, 1990

Random Walks on Weighted Graphs, and Applications to On-line Algorithms (Preliminary Version)
Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, 1990

Techniques for Parallel Manipulation of Sparse Matrices.
Theor. Comput. Sci., 1989

Cost-Bandwidth Tradeoffs for Communication Networks.
Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, 1989

On Communication Latency in PRAM Computations.
Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, 1989

Memory Versus Randomization in On-line Algorithms (Extended Abstract).
Proceedings of the Automata, Languages and Programming, 16th International Colloquium, 1989

Efficient and Correct Execution of Parallel Programs that Share Memory.
ACM Trans. Program. Lang. Syst., 1988

Efficient Synchronization on Multiprocessors with Shared Memory.
ACM Trans. Program. Lang. Syst., 1988

The Distribution of Waiting Times in Clocked Multistage Interconnection Networks.
IEEE Trans. Computers, 1988

Computing on an anonymous ring.
J. ACM, 1988

A Complexity Theory of Efficient Parallel Algorithms (Extended Abstract).
Proceedings of the Automata, Languages and Programming, 15th International Colloquium, 1988

A Model for Hierarchical Memory
Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987

Hierarchical Memory with Block Transfer
Proceedings of the 28th Annual Symposium on Foundations of Computer Science, 1987

A Unified Theory of Interconnection Network Structure.
Theor. Comput. Sci., 1986

Depth-Size Trade-Offs for Parallel Prefix Computation.
J. Algorithms, 1986

Exact Balancing is Not Always Good.
Inf. Process. Lett., 1986

Efficient Parallel Algorithms for Graph Models.
Proceedings of the International Conference on Parallel Processing, 1986

Applications of Ramsey's Theorem to Decision Tree Complexity
J. ACM, October, 1985

Lower Bounds on Probabilistic Linear Decision Trees.
Theor. Comput. Sci., 1985

The Power of Parallel Prefix.
IEEE Trans. Computers, 1985

On Parallel Searching.
SIAM J. Comput., 1985

Issues Related to MIMD Shared-memory Computers: The NYU Ultracomputer Approach.
Proceedings of the 12th Annual Symposium on Computer Architecture, 1985

The Importance of Being Square.
Proceedings of the 11th Annual Symposium on Computer Architecture, 1984

Applications of Ramsey's Theorem to Decision Trees Complexity (Preliminary Version)
Proceedings of the 25th Annual Symposium on Foundations of Computer Science, 1984

The Performance of Multistage Interconnection Networks for Multiprocessors.
IEEE Trans. Computers, 1983

The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer.
IEEE Trans. Computers, 1983

Circuit partitioning with size and connection constraints.
Networks, 1983

Comparisons between Linear Functions can Help.
Theor. Comput. Sci., 1982

Probabilities Over Rich Languages, Testing and Randomness.
J. Symb. Log., 1982

Some Exact Complexity Results for Straight-Line Computations over Semirings.
J. ACM, 1982

On Parallel Searching (Extended Abstract).
Proceedings of the ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, 1982

The NYU Ultracomputer-designing a MIMD, shared-memory parallel machine (Extended Abstract).
Proceedings of the 9th International Symposium on Computer Architecture (ISCA 1982), 1982

On the Complexity of Simplifying Quadratic Forms.
Inf. Process. Lett., 1981

Proving Lower Bounds for Linar Decision Trees.
Proceedings of the Automata, 1981

On the Depth Complexity of Formulas.
Math. Syst. Theory, 1980

On the Size Complexity of Monotone Formulas.
Proceedings of the Automata, 1980

סבוך העומק של נוסחאות (Depth complexity of formulas.).
PhD thesis, 1979

The covering problem of complete uniform hypergraphs.
Discret. Math., 1979

A Direct Approach to the Parallel Evaluation of Rational Expressions with a Small Number of Processors.
IEEE Trans. Computers, 1977
