2022
A Quantitative Theory of Bottleneck Structures for Data Networks.
CoRR, 2022
2021
Designing data center networks using bottleneck structures.
,
,
,
,
,
,
,
,
,
,
Proceedings of the ACM SIGCOMM 2021 Conference, Virtual Event, USA, August 23-27, 2021., 2021
Boundary Integral Solver Approaches for Particle Accelerator Simulation Problems and Deployment on NERSC Hardware.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
2020
Computing Bottleneck Structures at Scale for High-Precision Network Performance Analysis.
Proceedings of the IEEE/ACM Innovating the Network for Data-Intensive Science, 2020
Multiscale Data Analysis Using Binning, Tensor Decompositions, and Backtracking.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Approximate Inverse Chain Preconditioner: Iteration Count Case Study for Spectral Support Solvers.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
2019
On the Bottleneck Structure of Congestion-Controlled Networks.
Proc. ACM Meas. Anal. Comput. Syst., 2019
Enhancing Network Visibility and Security through Tensor Analysis.
Future Gener. Comput. Syst., 2019
PUMA-V: Optimizing Parallel Code Performance Through Interactive Visualization.
IEEE Computer Graphics and Applications, 2019
G2: A Network Optimization Framework for High-Precision Analysis of Bottleneck and Flow Performance.
Proceedings of the 2019 IEEE/ACM Innovating the Network for Data-Intensive Science, 2019
Combinatorial Multigrid: Advanced Preconditioners For Ill-Conditioned Linear Systems.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
Fast Large-Scale Algorithm for Electromagnetic Wave Propagation in 3D Media.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
Combining Tensor Decompositions and Graph Analytics to Provide Cyber Situational Awareness at HPC Scale.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
2018
Algorithms and data structures to accelerate network analysis.
Future Gener. Comput. Syst., 2018
Fast Detection of Elephant Flows with Dirichlet-Categorical Inference.
Proceedings of the 5th IEEE/ACM International Workshop on Innovating the Network for Data-Intensive Science, 2018
Accelerating Dijkstra's Algorithm Using Multiresolution Priority Queues.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
All-at-once Decomposition of Coupled Billion-scale Tensors in Apache Spark.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
2017
Multiresolution Priority Queues.
CoRR, 2017
Report of the HPC Correctness Summit, Jan 25-26, 2017, Washington, DC.
CoRR, 2017
Polyhedral Optimization of TensorFlow Computation Graphs.
Proceedings of the Programming and Performance Visualization Tools, 2017
Memory-efficient parallel tensor decompositions.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
2016
Efficient Compilation to Event-Driven Task Programs.
CoRR, 2016
A sparse multidimensional FFT for real positive vectors.
CoRR, 2016
Highly Scalable Near Memory Processing with Migrating Threads on the Emu System Architecture.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016
Automatic Code Generation and Data Management for an Asynchronous Task-Based Runtime.
Proceedings of the 5th Workshop on Extreme-Scale Programming Tools, 2016
An Interactive Visual Tool for Code Optimization and Parallelization Based on the Polyhedral Model.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016
Scalable Hierarchical Polyhedral Compilation.
Proceedings of the 45th International Conference on Parallel Processing, 2016
High-performance algorithms and data structures to catch elephant flows.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
Polyhedral compilation for energy efficiency.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
A sparse multi-dimensional Fast Fourier Transform with stability to noise in the context of image processing and change detection.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
Accelerated low-rank updates to tensor decompositions.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
A unified Coq framework for verifying C programs with floating-point computations.
Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs, 2016
2015
Polyhedral user mapping and assistant visualizer tool for the r-stream auto-parallelizing compiler.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 3rd IEEE Working Conference on Software Visualization, 2015
High-performance many-core networking: design and implementation.
Proceedings of the Second Workshop on Innovating the Network for Data-Intensive Science, 2015
Automatic cluster parallelization and minimizing communication via selective data replication.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015
Embedded second-order cone programming with radar applications.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015
Optimization of symmetric tensor computations.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015
2014
A Tale of Three Runtimes.
CoRR, 2014
Parallelizing and optimizing sparse tensor computations.
Proceedings of the 2014 International Conference on Supercomputing, 2014
Lockless hash tables with low false negatives.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2014
Low-overhead load-balanced scheduling for sparse tensor computations.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2014
2013
Re-Introduction of communication-avoiding FMM-accelerated FFTs with GPU acceleration.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2013
Runnemede: An architecture for Ubiquitous High-Performance Computing.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
Memory reuse optimizations in the R-Stream compiler.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013
2012
Scalable Cyber-Security for Terabit Cloud Computing.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Automatic communication optimizations through memory reuse strategies.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012
Efficient and scalable computations with sparse tensors.
Proceedings of the IEEE Conference on High Performance Extreme Computing, 2012
2011
Proceedings of the Encyclopedia of Parallel Computing, 2011
2010
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010
2007
Evaluation of Stream Virtual Machine on Raw Processor.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
2006
Poster reception - Alef parallel SAT solver for HPC hardware.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
2003
1998
Retrospective: the J-machine.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998
1997
1992
The message-driven processor: a multicomputer processing node with efficient mechanisms.
IEEE Micro, 1992
MDP Design Tools and Methods.
Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992
The Message Driven Processor: An Integrated Multicomputer Processing Element.
Proceedings of the Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1992
1989
The J-Machine: A Fine-Gain Concurrent Computer.
Proceedings of the Information Processing 89, Proceedings of the IFIP 11th World Computer Congress, San Francisco, USA, August 28, 1989