Torsten Hoefler
Orcid: 0000-0002-1333-9797Affiliations:
- ETH Zürich
According to our database1,
Torsten Hoefler
authored at least 420 papers
between 2005 and 2024.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2022, "For foundational contributions to High-Performance Computing and the application of HPC techniques to machine learning".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
IEEE Trans. Parallel Distributed Syst., August, 2024
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024
Future Gener. Comput. Syst., March, 2024
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries.
ACM Comput. Surv., February, 2024
IEEE Trans. Computers, January, 2024
Nat. Comput. Sci., 2024
Microprocess. Microsystems, 2024
CoRR, 2024
Fortify Your Foundations: Practical Privacy and Security for Foundation Model Deployments In The Cloud.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments.
CoRR, 2024
CoRR, 2024
Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip.
CoRR, 2024
REPS: Recycling Entropies for Packet Spraying to Adaptively Explore Paths and Mitigate Failures.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Towards Specialized Supercomputers for Climate Sciences: Computational Requirements of the Icosahedral Nonhydrostatic Weather and Climate Model.
CoRR, 2024
CoRR, 2024
LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming.
CoRR, 2024
SMaRTT-REPS: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport with Recycled Entropies.
CoRR, 2024
CoRR, 2024
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing.
CoRR, 2024
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, 2024
Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems Through Polling-Free and Retry-Free Operation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
Process-as-a-Service: Unifying Elastic and Stateful Clouds with Serverless Processes.
Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra.
IEEE Trans. Parallel Distributed Syst., December, 2023
Performance Measurement Dataset of the HPC Benchmarks FASTEST, Kripke, LULESH, MiniFE, Quicksilver, and RELeARN for Scalability Studies with Extra-P.
Dataset, November, 2023
Int. J. High Perform. Comput. Appl., July, 2023
IEEE Trans. Parallel Distributed Syst., June, 2023
Commun. ACM, May, 2023
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry" Benchmark.
CoRR, 2023
RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures.
CoRR, 2023
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models.
CoRR, 2023
High-Performance Graph Databases That Are Portable, Programmable, and Scale to Hundreds of Thousands of Cores.
CoRR, 2023
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization.
CoRR, 2023
AutoDDL: Automatic Distributed Deep Learning with Asymptotically Optimal Communication.
CoRR, 2023
Computer, 2023
Proceedings of the 2023 USENIX Annual Technical Conference, 2023
Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the International Conference for High Performance Computing, 2023
High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations.
Proceedings of the International Conference for High Performance Computing, 2023
The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores.
Proceedings of the International Conference for High Performance Computing, 2023
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
Proceedings of the Learning on Graphs Conference, 27-30 November 2023, Virtual Event., 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization.
Proceedings of the 37th International Conference on Supercomputing, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Proceedings of the Algorithms and Complexity - 13th International Conference, 2023
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023
Proceedings of the IEEE International Conference on Big Data, 2023
2022
Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration.
IEEE Trans. Parallel Distributed Syst., 2022
CoRR, 2022
Computer, 2022
Benchmarking Data Science: 12 Ways to Lie With Statistics and Performance on Parallel Computers.
Computer, 2022
Proceedings of the Structural Information and Communication Complexity, 2022
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the 15th IEEE Conference on Software Testing, Verification and Validation, 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022
Proceedings of the IEEE/ACM International Workshop on Exascale MPI, 2022
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022
2021
IEEE Trans. Parallel Distributed Syst., 2021
Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.
IEEE Trans. Parallel Distributed Syst., 2021
High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks.
IEEE Trans. Parallel Distributed Syst., 2021
Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads.
IEEE Trans. Computers, 2021
Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores.
IEEE Trans. Computers, 2021
Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation.
ACM Trans. Archit. Code Optim., 2021
SIAM J. Sci. Comput., 2021
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.
Proc. VLDB Endow., 2021
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability.
Proc. ACM Meas. Anal. Comput. Syst., 2021
Proc. ACM Program. Lang., 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks.
J. Mach. Learn. Res., 2021
CoRR, 2021
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.
CoRR, 2021
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.
CoRR, 2021
Proceedings of the 30th USENIX Security Symposium, 2021
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs.
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the International Conference for High Performance Computing, 2021
On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations.
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the International Conference for High Performance Computing, 2021
Chimera: efficiently training large-scale neural networks with bidirectional pipelines.
Proceedings of the International Conference for High Performance Computing, 2021
On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
Proceedings of the Middleware '21: 22nd International Middleware Conference, Québec City, Canada, December 6, 2021
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
A RISC-V in-network accelerator for flexible high-performance low-power packet processing.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.
Proceedings of the 38th International Conference on Machine Learning, 2021
Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021
Proceedings of the 12th International Conference on Ambient Systems, 2021
2020
Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020
ACM Trans. Reconfigurable Technol. Syst., 2020
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
Dawn: a High-level Domain-Specific Language Compiler Toolchain for Weather and Climate Applications.
Supercomput. Front. Innov., 2020
Proc. ACM Program. Lang., 2020
CoRR, 2020
High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers.
CoRR, 2020
Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning.
CoRR, 2020
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.
CoRR, 2020
CoRR, 2020
Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads.
CoRR, 2020
sRDMA - Efficient NIC-based Authentication and Encryption for Remote Direct Memory Access.
Proceedings of the 2020 USENIX Annual Technical Conference, 2020
Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020
Proceedings of the International Conference for High Performance Computing, 2020
High-performance parallel graph coloring with strong guarantees on work, depth, and quality.
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020
Taming unbalanced training workloads in deep learning with partial collective operations.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020
Identifying scalability bottlenecks for large-scale parallel programs with graph analysis.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020
ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations.
IEEE Trans. Parallel Distributed Syst., 2019
Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores.
Proc. VLDB Endow., 2019
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis.
ACM Comput. Surv., 2019
Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations.
Comput. Sci. Eng., 2019
Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism.
CoRR, 2019
A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations.
CoRR, 2019
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency.
CoRR, 2019
FatPaths: Routing in Supercomputers, Data Centers, and Clouds with Low-Diameter Networks when Shortest Paths Fall Short.
CoRR, 2019
Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.
CoRR, 2019
Head-of-line blocking avoidance in Slim Fly networks using deadlock-free non-minimal and adaptive routing.
Concurr. Comput. Pract. Exp., 2019
Optimizing the data movement in quantum transport simulations via data-centric parallel programming.
Proceedings of the International Conference for High Performance Computing, 2019
A data-centric approach to extreme-scale <i>ab initio</i> dissipative quantum transport simulations.
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the International Conference for High Performance Computing, 2019
Streaming message interface: high-performance distributed memory programming on reconfigurable hardware.
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the International Conference for High Performance Computing, 2019
Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics.
Proceedings of the International Conference for High Performance Computing, 2019
Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures.
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the 26th European MPI Users' Group Meeting, 2019
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019
Proceedings of the Platform for Advanced Scientific Computing Conference, 2019
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Using performance models to understand scalable Krylov solver performance at scale for structured grid problems.
Proceedings of the ACM International Conference on Supercomputing, 2019
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019
Embedding Functions Into Reversible Circuits: A Probabilistic Approach to the Number of Lines.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019
2018
IEEE Trans. Parallel Distributed Syst., 2018
Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations.
CoRR, 2018
Proceedings of the Verification, Model Checking, and Abstract Interpretation, 2018
Proceedings of the International Conference for High Performance Computing, 2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018
Proceedings of the Thirteenth EuroSys Conference, 2018
Proceedings of the IEEE International Conference on Cluster Computing, 2018
Proceedings of the IEEE International Conference on Cluster Computing, 2018
Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018
2017
IEEE Trans. Parallel Distributed Syst., 2017
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017
Scaling betweenness centrality using communication-efficient sparse matrix multiplication.
Proceedings of the International Conference for High Performance Computing, 2017
Proceedings of the International Conference for High Performance Computing, 2017
Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017
Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Model-Driven Choice of Numerical Methods for the Solution of the Linear Advection Equation.
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017
To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations.
Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017
An Effective Queuing Scheme to Provide Slim Fly Topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing.
Proceedings of the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2017
Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017
Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017
Multi-agent Pathfinding with n Agents on Graphs with n Vertices: Combinatorial Classification and Tight Algorithmic Bounds.
Proceedings of the Algorithms and Complexity - 10th International Conference, 2017
2016
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016
IEEE Trans. Parallel Distributed Syst., 2016
Int. J. High Perform. Comput. Appl., 2016
CoRR, 2016
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the International Conference for High Performance Computing, 2016
Selecting Technical Papers for an Interdisciplinary Conference: The PASC Review Process.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2016
Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, 2016
Proceedings of the 2016 International Conference on Supercomputing, 2016
Proceedings of the 25th International Conference on Computer Communication and Networks, 2016
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
2015
Proceedings of the International Conference for High Performance Computing, 2015
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results.
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015
Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015
Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015
Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015
Proceedings of the 10th International Conference on Future Internet, 2015
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
2014
Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations.
Supercomput. Front. Innov., 2014
Sci. Program., 2014
Application-oriented ping-pong benchmarking: how to assess the real communication overheads.
Computing, 2014
Clust. Comput., 2014
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014
Understanding the Effects of Communication and Coordination on Checkpointing at Scale.
Proceedings of the International Conference for High Performance Computing, 2014
Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures.
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the 21st European MPI Users' Group Meeting, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
2013
ACM Trans. Archit. Code Optim., 2013
Int. J. High Perform. Comput. Appl., 2013
MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory.
Computing, 2013
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the 20th European MPI Users's Group Meeting, 2013
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013
Proceedings of the Languages and Compilers for Parallel Computing, 2013
Proceedings of the International Conference on Supercomputing, 2013
Proceedings of the 42nd International Conference on Parallel Processing, 2013
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013
Proceedings of the Euro-Par 2013 Parallel Processing, 2013
2012
IEEE Micro, 2012
Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Proceedings of the Recent Advances in the Message Passing Interface, 2012
Proceedings of the Recent Advances in the Message Passing Interface, 2012
Proceedings of the Recent Advances in the Message Passing Interface, 2012
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012
Proceedings of the 20th Euromicro International Conference on Parallel, 2012
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012
Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2011
Concurr. Comput. Pract. Exp., 2011
Proceedings of the 2011 TeraGrid Conference - Extreme Digital Discovery, 2011
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011
Proceedings of the Recent Advances in the Message Passing Interface, 2011
Proceedings of the Recent Advances in the Message Passing Interface, 2011
Proceedings of the Recent Advances in the Message Passing Interface, 2011
Active pebbles: a programming model for highly parallel fine-grained data-driven computations.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011
Proceedings of the Practical Aspects of Declarative Languages, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011
Kernel-Based Offload of Collective Operations - Implementation, Evaluation and Lessons Learned.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
2010
Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale.
Int. J. Parallel Emergent Distributed Syst., 2010
Comput. Sci. Eng., 2010
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation.
Proceedings of the Conference on High Performance Computing Networking, 2010
Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues.
Proceedings of the Recent Advances in the Message Passing Interface, 2010
Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes.
Proceedings of the Recent Advances in the Message Passing Interface, 2010
Proceedings of the Recent Advances in the Message Passing Interface, 2010
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010
Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010
A space-efficient parallel algorithm for computing betweenness centrality in distributed memory.
Proceedings of the 2010 International Conference on High Performance Computing, 2010
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
2009
LogGP in theory and practice - An in-depth analysis of modern interconnection networks and benchmarking methods for collective operations.
Simul. Model. Pract. Theory, 2009
Parallel Process. Lett., 2009
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
A power-aware, application-based performance study of modern commodity cluster interconnection networks.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Group Operation Assembly Language - A Flexible Way to Express Collective Communication.
Proceedings of the ICPP 2009, 2009
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009
Proceedings of the 16th International Conference on High Performance Computing, 2009
2008
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008
Multistage switches are not crossbars: Effects of static routing in high-performance networks.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008
Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA) held at the 21st Conference on the Architecture of Computing Systems (ARCS), 2008
2007
Parallel Comput., 2007
Implementation and performance analysis of non-blocking collective operations for MPI.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007
A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the High Performance Computing and Communications, 2007
2006
Proceedings of the Fifth International Conference on Parallel Computing in Electrical Engineering (PARELEC 2006), 2006
Proceedings of the Fifth International Conference on Parallel Computing in Electrical Engineering (PARELEC 2006), 2006
Proceedings of the Frontiers of High Performance Computing and Networking, 2006
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006
Proceedings of the ARCS 2006, 2006
2005
A Practical Approach to the Rating of Barrier Algorithms Using the LogP Model and Open MPI.
Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005