Mattan Erez

Orcid: 0000-0002-1567-4097

According to our database1, Mattan Erez authored at least 94 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A Deep Dive into Task-Based Parallelism in Python.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023
MaxMem: Colocation and Performance for Big Data Applications on Tiered Main Memory Servers.
CoRR, 2023

Artemis: HE-Aware Training for Efficient Privacy-Preserving Machine Learning.
CoRR, 2023

Harvesting L2 Caches in Server Processors.
CoRR, 2023

Predicting Future-System Reliability with a Component-Level DRAM Fault Model.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

SecDDR: Enabling Low-Cost Secure Memories by Protecting the DDR Interface.
Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network, 2023

Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training.
Proceedings of the Asian Conference on Machine Learning, 2023

2022
Managing Prefetchers With Deep Reinforcement Learning.
IEEE Comput. Archit. Lett., 2022

Parla: A Python Orchestration System for Heterogeneous Architectures.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Reducing Load Latency with Cache Level Prediction.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

Accelerating bandwidth-bound deep learning inference with main-memory accelerators.
Proceedings of the International Conference for High Performance Computing, 2021

Dynamic Generation of Python Bindings for HPC Kernels.
Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, 2021

2020
Training with Multi-Layer Embeddings for Model Reduction.
CoRR, 2020

FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training.
CoRR, 2020

Runtime-guided ECC protection using online estimation of memory vulnerability.
Proceedings of the International Conference for High Performance Computing, 2020

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Near Data Acceleration with Concurrent Host Access.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

WoLFRaM: Enhancing Wear-Leveling and Fault Tolerance in Resistive Memories using Programmable Address Decoders.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

2019
CHoNDA: Near Data Acceleration with Concurrent Host Access.
CoRR, 2019

PruneTrain: Gradual Structured Pruning from Scratch for Faster Neural Network Training.
CoRR, 2019

PruneTrain: fast neural network training by dynamic sparse model reconfiguration.
Proceedings of the International Conference for High Performance Computing, 2019

Assessing the impact of timing errors on HPC applications.
Proceedings of the International Conference for High Performance Computing, 2019

Evaluating Compiler IR-Level Selective Instruction Duplication with Realistic Hardware Errors.
Proceedings of the 9th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2019

Mini-batch Serialization: CNN Training with Inter-layer Data Reuse.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

DeLTA: GPU Performance Model for Deep Learning Applications with In-Depth Memory System Traffic Analysis.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

GPU snapshot: checkpoint offloading for GPU-dense systems.
Proceedings of the ACM International Conference on Supercomputing, 2019

Kelp: QoS for Accelerated Machine Learning Systems.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

On the Trend of Resilience for GPU-Dense Systems.
Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019

2018
Do-It-Yourself Virtual Memory Translation.
ACM SIGOPS Oper. Syst. Rev., 2018

Special Issue on FTS.
Int. J. High Perform. Comput. Appl., 2018

CompressPoints: An Evaluation Methodology for Compressed Memory Systems.
IEEE Comput. Archit. Lett., 2018

Evaluating and accelerating high-fidelity error injection for HPC.
Proceedings of the International Conference for High Performance Computing, 2018

Compresso: Pragmatic Main Memory Compression.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-Ievel Fault Injection.
Proceedings of the 47th International Conference on Parallel Processing, 2018

SIPT: Speculatively Indexed, Physically Tagged Caches.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

DUO: Exposing On-Chip Redundancy to Rank-Level ECC for High Reliability.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Hamartia: A Fast and Accurate Error Injection Framework.
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2018

2017
A Distributed Multi-GPU System for Fast Graph Processing.
Proc. VLDB Endow., 2017

Optimizing Read-Once Data Flow in Big-Data Applications.
IEEE Comput. Archit. Lett., 2017

DRAM Scaling Error Evaluation Model Using Various Retention Time.
Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2017

2016
Variation-Tolerant Write Completion Circuit for Variable-Energy Write STT-RAM Architecture.
IEEE Trans. Very Large Scale Integr. Syst., 2016

All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

RelaxFault Memory Repair.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
Frugal ECC: efficient and versatile memory error protection through fine-grained compression.
Proceedings of the International Conference for High Performance Computing, 2015

CLEAN-ECC: high reliability ECC for adaptive granularity memory system.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Priority-based cache allocation in throughput processors.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Bamboo ECC: Strong, safe, and flexible codes for reliable computer memory.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Balancing reliability, cost, and performance tradeoffs with FreeFault.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Stay Alive, Don't Give Up: DUE and SDC Reduction with Memory Repair.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
Addressing failures in exascale computing.
Int. J. High Perform. Comput. Appl., 2014

2013
Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems.
Sci. Program., 2013

A locality-aware memory hierarchy for energy-efficient GPU architectures.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Variable-energy write STT-RAM architecture with bit-wise write-completion monitoring.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

The dual-path execution model for efficient GPU control flow.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012
Free-p: A Practical End-to-End Nonvolatile Memory Protection Mechanism.
IEEE Micro, 2012

A 530mV 10-lane SIMD processor with variation resiliency in 45nm SOI.
Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

The dynamic granularity memory system.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Balancing DRAM locality and parallelism in shared memory CMP systems.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011
Virtualized ECC: Flexible Reliability in Main Memory.
IEEE Micro, 2011

Static timing analysis for modeling QoS in networks-on-chip.
J. Parallel Distributed Comput., 2011

The Power of 1 + α for Memory-Efficient Bloom Filters.
Internet Math., 2011

Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

FREE-p: Protecting non-volatile memory against both hard and soft errors.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010
Synctium: a Near-Threshold Stream Processor for Energy-Constrained Parallel Applications.
IEEE Comput. Archit. Lett., 2010

NBTI-aware DVFS: a new approach to saving energy and increasing processor lifetime.
Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

Virtualized and flexible ECC for main memory.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009
Stream Processors.
Proceedings of the Multicore Processors and Systems, 2009

Express Virtual Channels with Capacitively Driven Global Links.
IEEE Micro, 2009

Flexible cache error protection using an ECC FIFO.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Packet-level static timing analysis for NoCs.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Memory mapped ECC: low-cost error protection for last level caches.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

2008
NoC with Near-Ideal Express Virtual Channels Using Global-Line Communication.
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

2007
Compilation for explicitly managed memory hierarchies.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Executing irregular scientific applications on stream architectures.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Architectural Support for the Stream Execution Model on General-Purpose Processors.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Sequoia: programming the memory hierarchy.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Architecture - The design space of data-parallel memory systems.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

2005
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Scatter-Add in Data Parallel Architectures.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004
Analysis and Performance Results of a Molecular Modeling Application on Merrimac.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Stream architectures - efficiency and programmability.
Proceedings of the 2004 International Symposium on System-on-Chip, 2004

Stream Register Files with Indexed Access.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

2003
Merrimac: Supercomputing with Streams.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

2000
eXtended Block Cache.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999
Speculation Techniques for Improving Load Related Instruction Scheduling.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999


  Loading...