Samuel Williams
Orcid: 0000-0002-8327-5717Affiliations:
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- University of California at Berkeley, CA, USA (PhD 2008)
According to our database1,
Samuel Williams
authored at least 105 papers
between 2001 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on linkedin.com
-
on orcid.org
-
on crd.lbl.gov
On csauthors.net:
Bibliography
2024
Concurr. Comput. Pract. Exp., August, 2024
Bricks: A high-performance portability layer for computations on block-structured grids.
Int. J. High Perform. Comput. Appl., 2024
CoRR, 2024
LPSim: Large Scale Multi-GPU Parallel Computing based Regional Scale Traffic Simulation Framework.
CoRR, 2024
FTL: Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks.
CoRR, 2024
Proceedings of the 53rd International Conference on Parallel Processing, 2024
2023
Performance-Portable GPU Acceleration of the EFIT Tokamak Plasma Equilibrium Reconstruction Code.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters.
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
2022
IEEE Trans. Computers, 2022
Concurr. Comput. Pract. Exp., 2022
Concurr. Comput. Pract. Exp., 2022
A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022
Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2022
2021
Proceedings of the Intelligent Computing, 2021
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
Proceedings of the 2021 International Workshop on Performance Modeling, 2021
Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs.
Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021
Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms, 2021
2020
Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC-9 Perlmutter system.
Concurr. Comput. Pract. Exp., 2020
Clust. Comput., 2020
Proceedings of the High Performance Computing - 35th International Conference, 2020
Proceedings of the Fourth IEEE/ACM Workshop on Deep Learning on Supercomputers, 2020
Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020
Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020
Understanding Quantum Control Processor Capabilities and Limitations through Circuit Characterization.
Proceedings of the International Conference on Rebooting Computing, 2020
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020
2019
J. Open Source Softw., 2019
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers.
Int. J. High Perform. Comput. Appl., 2019
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019
Performance Analysis of GPU Programming Models Using the Roofline Scaling Trajectories.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2019
2018
A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization.
Proceedings of the High Performance Computing - 33rd International Conference, 2018
Improving MPI Reduction Performance for Manycore Architectures with OpenMP and Data Compression.
Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018
2017
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations.
IEEE Trans. Parallel Distributed Syst., 2017
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers.
Parallel Comput., 2017
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends.
J. Parallel Distributed Comput., 2017
Int. J. High Perform. Comput. Appl., 2017
Proceedings of the High Performance Computing, 2017
Proceedings of the High Performance Computing, 2017
Performance analysis and optimization of the RAMPAGE metal alloy potential generation software.
Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
2016
An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling.
SIAM J. Sci. Comput., 2016
SIAM J. Sci. Comput., 2016
Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor.
Proceedings of the High Performance Computing, 2016
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the 2016 PGAS Applications Workshop, 2016
Proceedings of the 7th International Workshop on Performance Modeling, 2016
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016
2015
J. Parallel Distributed Comput., 2015
Int. J. High Perform. Comput. Appl., 2015
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling.
CoRR, 2015
Parallel implementation and performance optimization of the configuration-interaction method.
Proceedings of the International Conference for High Performance Computing, 2015
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015
Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the International Conference on Computational Science, 2015
2014
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 International Conference on Supercomputing, 2014
Proceedings of the 21st International Conference on High Performance Computing, 2014
2013
Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms.
Int. J. High Perform. Comput. Appl., 2013
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Compiler generation and autotuning of communication-avoiding operators for geometric multigrid.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013
2012
Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms.
IEEE Trans. Parallel Distributed Syst., 2012
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Poster: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Abstract: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2011
Parallel Comput., 2011
Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning.
Proceedings of the Conference on High Performance Computing Networking, 2011
Proceedings of the Conference on High Performance Computing Networking, 2011
Proceedings of the Conference on High Performance Computing Networking, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
2009
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.
SIAM Rev., 2009
Parallel Comput., 2009
Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms.
J. Parallel Distributed Comput., 2009
The impact of IBM Cell technology on the programming paradigm in the context of computer systems for climate and weather models.
Concurr. Comput. Pract. Exp., 2009
Commun. ACM, 2009
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Proceedings of the Architecture of Computing Systems, 2009
2008
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
2007
2006
Proceedings of the Third Conference on Computing Frontiers, 2006
Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006
2001