Katherine A. Yelick

Orcid: 0000-0003-0957-701X

Affiliations:
  • University of California, Berkeley, USA


According to our database1, Katherine A. Yelick authored at least 178 papers between 1985 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2012, "For contributions to parallel languages that improve programmer productivity.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
On Multilinear Inequalities of Holder-Brascamp-Lieb Type for Torsion-Free Discrete Abelian Groups.
J. Log. Anal., 2024

Exabiome: Advancing Microbial Science through Exascale Computing.
Comput. Sci. Eng., 2024

Distributed Matrix-Based Sampling for Graph Neural Network Training.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Sparsity-Aware Communication for Distributed Graph Neural Network Training.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023
High-Performance Filters for GPUs.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Designing Efficient SIMD Kernels for High Performance Sequence Alignment.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale <i>κ</i>-mer Analysis.
Proceedings of the SIAM Conference on Applied and Computational Discrete Algorithms, 2023

2022
Extreme-Scale Many-against-Many Protein Similarity Search.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly.
Proceedings of the 51st International Conference on Parallel Processing, 2022

Atos: A Task-Parallel GPU Scheduler for Graph Analytics.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations.
CoRR, 2021

CloudBank: Managed Services to Simplify Cloud Access for Computer Science Research and Education.
Proceedings of the PEARC '21: Practice and Experience in Advanced Research Computing, 2021

10 Years Later: Cloud Computing is Closing the Performance Gap.
Proceedings of the ICPE '21: ACM/SPEC International Conference on Performance Engineering, 2021

SPAA'21 Panel Paper: Architecture-Friendly Algorithms versus Algorithm-Friendly Architectures.
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

Accelerating large scale <i>de novo</i> metagenome assembly using GPUs.
Proceedings of the International Conference for High Performance Computing, 2021

QFAST: Conflating Search and Numerical Optimization for Scalable Quantum Circuit Synthesis.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2021

Asynchrony versus bulk-synchrony for a generalized N-body problem from genomics.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Distributed-Memory k-mer Counting on GPUs.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Scaling Generalized N-Body Problems, A Case Study from Genomics.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper.
Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms, 2021

2020
The Road for Recovery: Aligning COVID-19 efforts and building a more resilient future.
IEEE Data Eng. Bull., 2020

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction.
CoRR, 2020

Opportunities and Challenges for Next Generation Computing.
CoRR, 2020

The Parallelism Motifs of Genomic Data Analysis.
CoRR, 2020

ADEPT: a domain independent sequence alignment strategy for gpu architectures.
BMC Bioinform., 2020

Reducing communication in graph neural network training.
Proceedings of the International Conference for High Performance Computing, 2020

Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

GPU accelerated partial order multiple sequence alignment for long reads self-correction.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Computing and Data Challenges in Climate Change.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

2019
RDMA vs. RPC for Implementing Distributed Data Structures.
Proceedings of the 9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2019

diBELLA: Distributed Long Read to Long Read Alignment.
Proceedings of the 48th International Conference on Parallel Processing, 2019

BCL: A Cross-Platform Distributed Data Structures Library.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
BCL: A Cross-Platform Distributed Container Library.
CoRR, 2018

Extreme scale de novo metagenome assembly.
Proceedings of the International Conference for High Performance Computing, 2018

CHIUW 2018 Keynote.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Communication-Avoiding Optimization Methods for Massive-Scale Graphical Model Structure Learning.
CoRR, 2017

Advanced Cyberinfrastructure for Science, Engineering, and Public Policy.
CoRR, 2017

Extreme-Scale De Novo Genome Assembly.
CoRR, 2017

MerBench: PGAS Benchmarks for High Performance Genome Assembly.
Proceedings of PAW@SC 2017: Second Annual PGAS Applications Workshop, 2017

Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
An Asynchronous Task-based Fan-Both Sparse Cholesky Solver.
CoRR, 2016

Accelerating Science: A Computing Research Agenda.
CoRR, 2016

21st Century Computer Architecture.
CoRR, 2016

A Hartree-Fock Application Using UPC++ and the New DArray Library.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
HipMer: an extreme-scale de novo genome assembler.
Proceedings of the International Conference for High Performance Computing, 2015

merAligner: A Fully Parallel Sequence Aligner.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Parallel Hessian Assembly for Seismic Waveform Inversion Using Global Updates.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

The Endgame for Moore's Law: Architecture, Algorithm, and Application Challenges.
Proceedings of the Federated Computing Research Conference, 2015

2014
A Computation- and Communication-Optimal Parallel Direct 3-Body Algorithm.
Proceedings of the International Conference for High Performance Computing, 2014

Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly.
Proceedings of the International Conference for High Performance Computing, 2014

A Local-View Array Library for Partitioned Global Address Space C++ Programs.
Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

Evaluation of PGAS Communication Paradigms with Geometric Multigrid.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

UPC++: A PGAS Extension for C++.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

On the conditions for efficient interoperability with threads: an experience with PGAS languages using cray communication domains.
Proceedings of the 2014 International Conference on Supercomputing, 2014

2013
Best paper awards: 26th international parallel and distributed processing symposium (IPDPS 2012).
J. Parallel Distributed Comput., 2013

Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1.
CoRR, 2013

Hierarchical Computation in the SPMD Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

A Communication-Optimal N-Body Algorithm for Direct Interactions.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012
Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms.
IEEE Trans. Parallel Distributed Syst., 2012

A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI.
SIGMETRICS Perform. Evaluation Rev., 2012

Communication avoiding and overlapping for numerical linear algebra.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Keynote address: Moving a science workload to exascale computing.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Compiling to avoid communication.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Titanium.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Tuning collective communication for Partitioned Global Address Space programming models.
Parallel Comput., 2011

Yada: Straightforward parallel programming.
Parallel Comput., 2011

The International Exascale Software Project roadmap.
Int. J. High Perform. Comput. Appl., 2011

Exascale opportunities and challenges.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

2010
Hybrid PGAS runtime support for multicore nodes.
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, 2010

Auto-Tuning Stencil Computations on Multicore and Accelerators.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.
SIAM Rev., 2009

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
Parallel Comput., 2009

Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms.
J. Parallel Distributed Comput., 2009

Technical perspective - Abstraction for parallelism.
Commun. ACM, 2009

A view of the parallel computing landscape.
Commun. ACM, 2009

Minimizing communication in sparse matrix solvers.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Enforcing Textual Alignment of Collectives Using Dynamic Checks.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Ten ways to waste a parallel computer.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Scheduling dynamic parallelism on accelerators.
Proceedings of the 6th Conference on Computing Frontiers, 2009

Improving Memory Subsystem Performance Using ViVA: Virtual Vector Architecture.
Proceedings of the Architecture of Computing Systems, 2009

2008
DARPA's HPCS Program- History, Models, Tools, Languages.
Adv. Comput., 2008

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Programming models for petascale to exascale.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Lattice Boltzmann simulation optimization on leading multicore platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Avoiding communication in sparse matrix computations.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Performance portable optimizations for loops containing communication operations.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

2007
Languages for High-Productivity Computing: the DARPA HPCS Language Project.
Parallel Process. Lett., 2007

Scientific Computing Kernels on the Cell Processor.
Int. J. Parallel Program., 2007

Parallel Languages and Compilers: Perspective From the Titanium Experience.
Int. J. High Perform. Comput. Appl., 2007

When cache blocking of sparse matrix vector multiply works and why.
Appl. Algebra Eng. Commun. Comput., 2007

Deadlock-free scheduling of X10 computations with bounded resources.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

An adaptive mesh refinement benchmark for modern parallel programming languages.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Multi-threading and one-sided communication in parallel LU factorization.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Hierarchical Pointer Analysis for Distributed Programs.
Proceedings of the Static Analysis, 14th International Symposium, 2007

Automatic Communication Performance Debugging in PGAS Languages.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Productivity and performance using partitioned global address space languages.
Proceedings of the Parallel Symbolic Computation, 2007

Automatic nonblocking communication for partitioned global address space programs.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

2006
Distributed Immersed Boundary Simulation in Titanium.
SIAM J. Sci. Comput., 2006

Particles and contiuum - Performance modeling and optimization of a high energy colliding beam simulation code.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - Optimized collectives for PGAS languages with one-sided communication.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Performance Advantages of Partitioned Global Address Space Languages.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Compilation Techniques for Partitioned Global Address Space Languages.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Optimizing bandwidth limited problems using one-sided communication and overlap.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Performance Analysis of a High Energy Colliding Beam Simulation Code on Four HPC Architectures.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

The potential of the cell processor for scientific computing.
Proceedings of the Third Conference on Computing Frontiers, 2006

Implicit and explicit optimizations for stencil computations.
Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006

2005
Self-Adapting Linear Algebra Algorithms and Software.
Proc. IEEE, 2005

Making Sequential Consistency Practical in Titanium.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Language innovations for HPCS.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Concurrency Analysis for Parallel Programs with Textually Aligned Barriers.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Titanium Performance and Potential: An NPB Experimental Study.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Automatic Support for Irregular Computations in a High-Level Language.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Communication Optimizations for Fine-Grained UPC Applications.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

Impact of modern memory subsystems on cache optimizations for stencil computations.
Proceedings of the 2005 workshop on Memory System Performance, 2005

2004
Special Issue on Automatic Performance Tuning.
Int. J. High Perform. Comput. Appl., 2004

Sparsity: Optimization Framework for Sparse Matrix Kernels.
Int. J. High Perform. Comput. Appl., 2004

Performance Tuning of Matrix Triple Products Based on Matrix Structure.
Proceedings of the Applied Parallel Computing, 2004

Array Prefetching for Irregular Array Accesses in Titanium.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Identifying Performance Bottlenecks on Modern Microarchitectures Using an Adaptable Probe.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Evaluating support for global address space languages on the Cray X1.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2003
Type Systems for Distributed Data Sharing.
Proceedings of the Static Analysis, 10th International Symposium, 2003

Polynomial-Time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

An Evaluation of Current High-Performance Networks.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A performance analysis of the Berkeley UPC compiler.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Memory Hierarchy Optimizations and Performance ounds for Sparse A.
Proceedings of the Computational Science - ICCS 2003, 2003

2002
ROC-1: Hardware Support for Recovery-Oriented Computing.
IEEE Trans. Computers, 2002

Performance optimizations and bounds for sparse matrix-vector multiply.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001
Hardware/compiler codevelopment for an embedded media processor.
Proc. IEEE, 2001

Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY.
Proceedings of the Computational Science - ICCS 2001, 2001

2000
Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Performance Analysis of an H.263 Video Encoder for VIRAM.
Proceedings of the 2000 International Conference on Image Processing, 2000

1999
Optimizing Sparse Matrix Vector Multiplication on SMP.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Cluster I/O with River: Making the Fast Case Common.
Proceedings of the Sixth Workshop on I/O in Parallel and Distributed Systems, 1999

1998
Titanium: A High-performance Java Dialect.
Concurr. Pract. Exp., 1998

1997
A case for intelligent RAM.
IEEE Micro, 1997

Models and Scheduling Algorithms for Mixed Data and Task Parallel Programs.
J. Parallel Distributed Comput., 1997

Scalable Processors in the Billion-Transistor Era: IRAM.
Computer, 1997

The Energy Efficiency of IRAM Architectures.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Intelligent RAM (IRAM): The Industrial Setting, Applications and Architectures.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

1996
Analyses and Optimizations for Shared Address Space Programs.
J. Parallel Distributed Comput., 1996

Systems Support for Irregular Parallel Applications (Abstract).
Proceedings of the Parallel Algorithms for Irregularly Structured Problems, 1996

Performance Modeling and Composition: A Case Study in Cell Simulation.
Proceedings of IPPS '96, 1996

Evaluation of Architectural Support for Global Address-Based Communication in Large-Scale Parallel Machines.
Proceedings of the ASPLOS-VII Proceedings, 1996

1995
Modeling the Benefits of Mixed Data and Task Parallelism.
Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, 1995

Parallelizing the Phylogeny Problem.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Portable Parallel Irregular Applications.
Proceedings of the Parallel Symbolic Languages and Systems, 1995

Optimizing Parallel Programs with Explicit Synchronization.
Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), 1995

Runtime Support for Portable Distributed Data Structures.
Proceedings of the Languages, 1995

Empirical Evaluation of the CRAY-T3D: A Compiler Perspective.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Portable Runtime Support for Asynchronous Simulation.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994
Distributed Data Structures and Algorithms for Gröbner Basis Computation.
LISP Symb. Comput., 1994

Optimizing Parallel SPMD Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 1994

Connected components on distributed memory machines.
Proceedings of the Parallel Algorithms, 1994

1993

Parallel programming in Split-C.
Proceedings of the Proceedings Supercomputing '93, 1993

On the Correctness of a Distributed Memory Gröbner basis Algorithm.
Proceedings of the Rewriting Techniques and Applications, 5th International Conference, 1993

Implementing an Irregular Application on a Distributed Memory Multiprocessor.
Proceedings of the Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1993

Parallel timing simulation on a distributed memory multiprocessor.
Proceedings of the 1993 IEEE/ACM International Conference on Computer-Aided Design, 1993

1992
Programming Models for Irregular Applications.
Proceedings of the 2nd SIGPLAN Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Multiprocessors, Boulder, Colorado, September 30, 1992

A Parallel Completion Procedure for Term Rewriting Systems.
Proceedings of the Automated Deduction, 1992

Using Moded Type Systems to Support Abstraction in Logic Programs.
Proceedings of the Types in Logic Programming., 1992

1990
Parallel Completion.
Proceedings of the Parallelization in Inference Systems, 1990

1989
Moded Type Systems for Logic Programming.
Proceedings of the Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, 1989

1987
Unification in Combinations of Collapse-Free Regular Theories.
J. Symb. Comput., 1987

1985
Combining Unification Algorithms for Confined Regular Equational Theories.
Proceedings of the Rewriting Techniques and Applications, First International Conference, 1985


  Loading...