Stephen W. Keckler
Orcid: 0000-0001-6701-6099Affiliations:
- NVIDIA
- University of Texas at Austin, USA
According to our database1,
Stephen W. Keckler
authored at least 161 papers
between 1992 and 2024.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2011, "For contributions to computer architectures and technology modeling.".
IEEE Fellow
IEEE Fellow 2011, "For contributions to computer architectures and memory systems".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on id.loc.gov
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
Probabilistic Tracker Management Policies for Low-Cost and Scalable Rowhammer Mitigation.
CoRR, 2024
CoRR, 2024
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024
2023
Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing.
ACM Trans. Comput. Syst., 2023
cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications.
Proc. ACM Program. Lang., 2023
Proceedings of the IEEE Intelligent Vehicles Symposium, 2023
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
2022
IEEE Trans. Dependable Secur. Comput., 2022
ACM Trans. Archit. Code Optim., 2022
Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications.
CoRR, 2022
Saving PAM4 Bus Energy with SMOREs: Sparse Multi-level Opportunistic Restricted Encodings.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Exploiting Temporal Data Diversity for Detecting Safety-critical Faults in AV Compute Systems.
Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2022
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
2021
SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference.
IEEE J. Solid State Circuits, 2021
Commun. ACM, 2021
Proceedings of the IEEE Intelligent Vehicles Symposium, 2021
Suraksha: A Framework to Analyze the Safety Implications of Perception Design Choices in AVs.
Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering, 2021
Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering, 2021
Suraksha: A Quantitative AV Safety Evaluation Framework to Analyze Safety Implications of Perception Design Choices.
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2021
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021
2020
A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm.
IEEE J. Solid State Circuits, 2020
Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020
2019
Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs.
ACM Trans. Archit. Code Optim., 2019
Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors.
CoRR, 2019
A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019
SNAP: A 1.67 - 21.55TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019
Proceedings of the ACM International Conference on Supercomputing, 2019
Proceedings of the International Conference on Computer-Aided Design, 2019
A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019
Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019
ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection.
Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019
Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
2018
ACM Trans. Archit. Code Optim., 2018
Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training.
CoRR, 2018
Proceedings of the International Conference for High Performance Computing, 2018
SwapCodes: Error Codes for Hardware-Software Cooperative GPU Pipeline Error Detection.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
2017
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.
CoRR, 2017
Understanding error propagation in deep learning neural network (DNN) accelerators and applications.
Proceedings of the International Conference for High Performance Computing, 2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
2016
CoRR, 2016
vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the Second International Symposium on Memory Systems, 2016
Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
A real-time energy-efficient superpixel hardware accelerator for mobile computer vision applications.
Proceedings of the 53rd Annual Design Automation Conference, 2016
2015
Proceedings of the 2015 International Symposium on Memory Systems, 2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015
2014
IEEE Trans. Computers, 2014
2014 International Symposium on Computer Architecture Influential Paper Award; 2014 Maurice Wilkes Award Given to Ravi Rajwar.
IEEE Micro, 2014
Commun. ACM, 2014
Proceedings of the International Conference for High Performance Computing, 2014
Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014
2013
How to implement effective prediction and forwarding for fusable dynamic multicore architectures.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
Proceedings of the 50th Annual Design Automation Conference 2013, 2013
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013
2012
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors.
ACM Trans. Comput. Syst., 2012
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
2011
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011
Evaluation and optimization of multicore performance bottlenecks in supercomputing applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011
2010
Proceedings of the Third International Workshop on Network on Chip Architectures, 2010
Proceedings of the Computer Architecture, 2010
2009
Proceedings of the Multicore Processors and Systems, 2009
Proceedings of the Multicore Processors and Systems, 2009
Proceedings of the Second International Workshop on Network on Chip Architectures, 2009
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009
2008
SIGARCH Comput. Archit. News, 2008
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008
2007
IEEE Trans. Parallel Distributed Syst., 2007
Proceedings of the ACM SIGCOMM 2007 Conference on Applications, 2007
Proceedings of the First International Symposium on Networks-on-Chips, 2007
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007
2006
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006
Proceedings of the 5th International Symposium on Memory Management, 2006
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006
2004
ACM Trans. Archit. Code Optim., 2004
SIGMETRICS Perform. Evaluation Rev., 2004
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004
2003
IEEE Trans. Very Large Scale Integr. Syst., 2003
IEEE Micro, 2003
IEEE Micro, 2003
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003
2002
SIGARCH Comput. Archit. News, 2002
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic.
Proceedings of the 2002 International Conference on Dependable Systems and Networks (DSN 2002), 2002
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002
2001
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001
2000
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000
Proceedings of the High Performance Computing, Third International Symposium, 2000
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000
1999
1998
Fast thread communication and synchronization mechanisms for a scalable single chip multiprocessor.
PhD thesis, 1998
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998
The effects of explicitly parallel mechanisms on the multi-ALU processor cluster pipeline.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998
1997
1994
Proceedings of the ASPLOS-VI Proceedings, 1994
1992
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992