Mattan Erez

Michael Orshansky

CoRR, 2023

Harvesting L2 Caches in Server Processors.

[BibT_eX]

[DOI]

Majid Jalili

CoRR, 2023

Predicting Future-System Reliability with a Component-Level DRAM Fault Model.

[BibT_eX]

[DOI]

Jeageun Jung

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

SecDDR: Enabling Low-Cost Secure Memories by Protecting the DDR Interface.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network, 2023

Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training.

[BibT_eX]

[DOI]

Zihao Deng

Benjamin Ghaemmaghami

Proceedings of the Asian Conference on Machine Learning, 2023

2022

Managing Prefetchers With Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Majid Jalili

IEEE Comput. Archit. Lett., 2022

Parla: A Python Orchestration System for Heterogeneous Architectures.

[BibT_eX]

[DOI]

Christopher J. Rossbach

George Biros

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Reducing Load Latency with Cache Level Prediction.

[BibT_eX]

[DOI]

Majid Jalili

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021

HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM.

[BibT_eX]

[DOI]

Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

Accelerating bandwidth-bound deep learning inference with main-memory accelerators.

[BibT_eX]

[DOI]

Benjamin Y. Cho

Jeageun Jung

Proceedings of the International Conference for High Performance Computing, 2021

Dynamic Generation of Python Bindings for HPC Kernels.

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, 2021

2020

Training with Multi-Layer Embeddings for Model Reduction.

[BibT_eX]

[DOI]

Benjamin Ghaemmaghami

CoRR, 2020

FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training.

[BibT_eX]

[DOI]

Sangkug Lym

CoRR, 2020

Runtime-guided ECC protection using online estimation of memory vulnerability.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Near Data Acceleration with Concurrent Host Access.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

WoLFRaM: Enhancing Wear-Leveling and Fault Tolerance in Resistive Memories using Programmable Address Decoders.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Computer Design, 2020

2019

CHoNDA: Near Data Acceleration with Concurrent Host Access.

[BibT_eX]

[DOI]

CoRR, 2019

PruneTrain: Gradual Structured Pruning from Scratch for Faster Neural Network Training.

[BibT_eX]

[DOI]

CoRR, 2019

PruneTrain: fast neural network training by dynamic sparse model reconfiguration.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

Assessing the impact of timing errors on HPC applications.

[BibT_eX]

[DOI]

Chun-Kai Chang

Wenqi Yin

Proceedings of the International Conference for High Performance Computing, 2019

Evaluating Compiler IR-Level Selective Instruction Duplication with Realistic Hardware Errors.

[BibT_eX]

[DOI]

Chun-Kai Chang

Guanpeng Li

Proceedings of the 9th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2019

Mini-batch Serialization: CNN Training with Inter-layer Data Reuse.

[BibT_eX]

[DOI]

Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

DeLTA: GPU Performance Model for Deep Learning Applications with In-Depth Memory System Traffic Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

GPU snapshot: checkpoint offloading for GPU-dense systems.

[BibT_eX]

[DOI]

Kyushick Lee

Michael B. Sullivan

Siva Kumar Sastry Hari

Timothy Tsai

Stephen W. Keckler

Parthasarathy Ranganathan

Proceedings of the ACM International Conference on Supercomputing, 2019

Kelp: QoS for Accelerated Machine Learning Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

On the Trend of Resilience for GPU-Dense Systems.

[BibT_eX]

[DOI]

Kyushick Lee

Michael B. Sullivan

Siva Kumar Sastry Hari

Timothy Tsai

Stephen W. Keckler

Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019

2018

Do-It-Yourself Virtual Memory Translation.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., 2018

Special Issue on FTS.

[BibT_eX]

[DOI]

Wesley Bland

Int. J. High Perform. Comput. Appl., 2018

CompressPoints: An Evaluation Methodology for Compressed Memory Systems.

[BibT_eX]

[DOI]

Esha Choukse

Alaa R. Alameldeen

IEEE Comput. Archit. Lett., 2018

Evaluating and accelerating high-fidelity error injection for HPC.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Compresso: Pragmatic Main Memory Compression.

[BibT_eX]

[DOI]

Esha Choukse

Alaa R. Alameldeen

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-Ievel Fault Injection.

[BibT_eX]

[DOI]

Omer Subasi

Chun-Kai Chang

Sriram Krishnamoorthy

Proceedings of the 47th International Conference on Parallel Processing, 2018

SIPT: Speculatively Indexed, Physically Tagged Caches.

[BibT_eX]

[DOI]

Tianhao Zheng

Haishan Zhu

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

DUO: Exposing On-Chip Redundancy to Rank-Level ECC for High Reliability.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Hamartia: A Fast and Accurate Error Injection Framework.

[BibT_eX]

[DOI]

Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2018

2017

A Distributed Multi-GPU System for Fast Graph Processing.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2017

Optimizing Read-Once Data Flow in Big-Data Applications.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

DRAM Scaling Error Evaluation Model Using Various Retention Time.

[BibT_eX]

[DOI]

Seong-Lyong Gong

Jungrae Kim

Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2017

2016

Variation-Tolerant Write Completion Circuit for Variable-Energy Write STT-RAM Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

RelaxFault Memory Repair.

[BibT_eX]

[DOI]

Dong-Wan Kim

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems.

[BibT_eX]

[DOI]

Haishan Zhu

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

Frugal ECC: efficient and versatile memory error protection through fine-grained compression.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

CLEAN-ECC: high reliability ECC for adaptive granularity memory system.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Priority-based cache allocation in throughput processors.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Bamboo ECC: Strong, safe, and flexible codes for reliable computer memory.

[BibT_eX]

[DOI]

Jungrae Kim

Michael B. Sullivan

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Balancing reliability, cost, and performance tradeoffs with FreeFault.

[BibT_eX]

[DOI]

Dong-Wan Kim

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Stay Alive, Don't Give Up: DUE and SDC Reduction with Memory Repair.

[BibT_eX]

[DOI]

Dong-Wan Kim

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

Addressing failures in exascale computing.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2014

2013

Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems.

[BibT_eX]

[DOI]

Sci. Program., 2013

A locality-aware memory hierarchy for energy-efficient GPU architectures.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Variable-energy write STT-RAM architecture with bit-wise write-completion monitoring.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation.

[BibT_eX]

[DOI]

Minsoo Rhu

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

The dual-path execution model for efficient GPU control flow.

[BibT_eX]

[DOI]

Minsoo Rhu

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012

Free-p: A Practical End-to-End Nonvolatile Memory Protection Mechanism.

[BibT_eX]

[DOI]

Parthasarathy Ranganathan

Naveen Muralimanohar

Jichuan Chang

Norman P. Jouppi

IEEE Micro, 2012

A 530mV 10-lane SIMD processor with variation resiliency in 45nm SOI.

[BibT_eX]

[DOI]

Nariman Moezzi Madani

Patrick Chiang

Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

The dynamic granularity memory system.

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures.

[BibT_eX]

[DOI]

Minsoo Rhu

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures.

[BibT_eX]

[DOI]

Evgeni Krimer

Patrick Chiang

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Balancing DRAM locality and parallelism in shared memory CMP systems.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011

Virtualized ECC: Flexible Reliability in Main Memory.

[BibT_eX]

[DOI]

IEEE Micro, 2011

Static timing analysis for modeling QoS in networks-on-chip.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2011

The Power of 1 + α for Memory-Efficient Bloom Filters.

[BibT_eX]

[DOI]

Evgeni Krimer

Internet Math., 2011

Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput.

[BibT_eX]

[DOI]

Min Kyu Jeong

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

FREE-p: Protecting non-volatile memory against both hard and soft errors.

[BibT_eX]

[DOI]

Parthasarathy Ranganathan

Naveen Muralimanohar

Jichuan Chang

Norman P. Jouppi

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010

Synctium: a Near-Threshold Stream Processor for Energy-Constrained Parallel Applications.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2010

NBTI-aware DVFS: a new approach to saving energy and increasing processor lifetime.

[BibT_eX]

[DOI]

Mehmet Basoglu

Michael Orshansky

Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

Virtualized and flexible ECC for main memory.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009

Stream Processors.

[BibT_eX]

[DOI]

Proceedings of the Multicore Processors and Systems, 2009

Express Virtual Channels with Capacitively Driven Global Links.

[BibT_eX]

[DOI]

IEEE Micro, 2009

Flexible cache error protection using an ECC FIFO.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Packet-level static timing analysis for NoCs.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Memory mapped ECC: low-cost error protection for last level caches.

[BibT_eX]

[DOI]

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

2008

NoC with Near-Ideal Express Virtual Channels Using Global-Line Communication.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

2007

Compilation for explicitly managed memory hierarchies.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Executing irregular scientific applications on stream architectures.

[BibT_eX]

[DOI]

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors.

[BibT_eX]

[DOI]

Jung Ho Ahn

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Architectural Support for the Stream Execution Model on General-Purpose Processors.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

Sequoia: programming the memory hierarchy.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Architecture - The design space of data-parallel memory systems.

[BibT_eX]

[DOI]

Jung Ho Ahn

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

2005

Fault Tolerance Techniques for the Merrimac Streaming Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Scatter-Add in Data Parallel Architectures.

[BibT_eX]

[DOI]

Jung Ho Ahn

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004

Analysis and Performance Results of a Molecular Modeling Application on Merrimac.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Stream architectures - efficiency and programmability.

[BibT_eX]

[DOI]