Onur Mutlu

Orcid: 0000-0002-0075-2312

Affiliations:
  • ETH Zurich
  • Carnegie Mellon University, Pittsburgh, USA


According to our database1, Onur Mutlu authored at least 554 papers between 2003 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2017, "For contributions to computer architecture research, especially in memory systems".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
EcoFlow: Efficient Convolutional Dataflows on Low-Power Neural Network Accelerators.
IEEE Trans. Computers, September, 2024

Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture.
ACM Trans. Archit. Code Optim., September, 2024

Cross-core Data Sharing for Energy-efficient GPUs.
ACM Trans. Archit. Code Optim., September, 2024

GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping.
IEEE Trans. Computers, May, 2024

SparseACC: A Generalized Linear Model Accelerator for Sparse Datasets.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2024

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis.
ACM Trans. Archit. Code Optim., March, 2024

RowPress Vulnerability in Modern DRAM Chips.
IEEE Micro, 2024

Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments.
CoRR, 2024

Roadmap to Neuromorphic Computing with Emerging Technologies.
CoRR, 2024

Understanding the Security Benefits and Overheads of Emerging Industry Solutions to DRAM Read Disturbance.
CoRR, 2024

Leveraging Adversarial Detection to Enable Scalable and Low Overhead RowHammer Mitigations.
CoRR, 2024

Amplifying Main Memory-Based Timing Covert and Side Channels using Processing-in-Memory Operations.
CoRR, 2024

Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System.
CoRR, 2024

Virtuoso: An Open-Source, Comprehensive and Modular Simulation Framework for Virtual Memory Research.
CoRR, 2024

PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures.
CoRR, 2024

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing.
CoRR, 2024

Accelerating Graph Neural Networks on Real Processing-In-Memory Systems.
CoRR, 2024

Rethinking the Producer-Consumer Relationship in Modern DRAM-Based Systems.
CoRR, 2024

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts.
CoRR, 2024

Material modeling and recent findings in transcatheter aortic valve implantation simulations.
Comput. Methods Programs Biomed., 2024

Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management.
IEEE Comput. Archit. Lett., 2024

Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator.
IEEE Comput. Archit. Lett., 2024

SpyHammer: Understanding and Exploiting RowHammer Under Fine-Grained Temperature Variations.
IEEE Access, 2024

MATSA: An MRAM-Based Energy-Efficient Accelerator for Time Series Analysis.
IEEE Access, 2024

ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation.
Proceedings of the 33rd USENIX Security Symposium, 2024

Self-Managing DRAM: A Low-Cost Framework for Enabling Autonomous and Efficient DRAM Maintenance Operations.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

BreakHammer: Enhancing RowHammer Mitigations by Carefully Throttling Suspect Threads.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

QUETZAL: Vector Acceleration Framework for Modern Genome Sequence Analysis Algorithms.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

CoMeT: Count-Min-Sketch-based Row Tracking to Mitigate RowHammer at Low Cost.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis.
Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2024

Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips.
Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2024

An Experimental Characterization of Combined RowHammer and RowPress Read Disturbance in Modern DRAM Chips.
Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2024

AERO: Adaptive Erase Operation for Improving Lifetime and Performance of Modern NAND Flash-Based SSDs.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System.
Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024

2023
DRAM Bender: An Extensible and Versatile FPGA-Based Infrastructure to Easily Test State-of-the-Art DRAM Chips.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023

Run-Time Resource Management in CMPs Handling Multiple Aging Mechanisms.
IEEE Trans. Computers, October, 2023

Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs.
Bioinform., May, 2023

A framework for high-throughput sequence alignment using real processing-in-memory systems.
Bioinform., May, 2023

PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM.
ACM Trans. Archit. Code Optim., March, 2023

How does hemodynamics affect rupture tissue mechanics in abdominal aortic aneurysm: Focus on wall shear stress derived parameters, time-averaged wall shear stress, oscillatory shear index, endothelial cell activation potential, and relative residence time.
Comput. Biol. Medicine, March, 2023

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems.
IEEE Trans. Emerg. Top. Comput., 2023

Using Local Cache Coherence for Disaggregated Memory Systems.
ACM SIGOPS Oper. Syst. Rev., 2023

Fetal ECG extraction from maternal ECG using deeply supervised LinkNet++ model.
Eng. Appl. Artif. Intell., 2023

PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips.
CoRR, 2023

MetaStore: High-Performance Metagenomic Analysis via In-Storage Computing.
CoRR, 2023

MetaFast: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation.
CoRR, 2023

SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences.
CoRR, 2023

An In-Memory Architecture for High-Performance Long-Read Pre-Alignment Filtering.
CoRR, 2023

Understanding Read Disturbance in High Bandwidth Memory: An Experimental Analysis of Real HBM2 DRAM Chips.
CoRR, 2023

DaPPA: A Data-Parallel Framework for Processing-in-Memory Architectures.
CoRR, 2023

GateSeeder: Near-memory CPU-FPGA Acceleration of Short and Long Read Mapping.
CoRR, 2023

Retrospective: Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors.
CoRR, 2023

Retrospective: An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms.
CoRR, 2023

Retrospective: RAIDR: Retention-Aware Intelligent DRAM Refresh.
CoRR, 2023

Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing.
CoRR, 2023

Memory-Centric Computing.
CoRR, 2023

Accelerating Genome Analysis via Algorithm-Architecture Co-Design.
CoRR, 2023

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems.
CoRR, 2023

RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs.
CoRR, 2023

RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes.
Bioinform., 2023

Extending Memory Capacity in Modern Consumer Systems With Emerging Non-Volatile Memory: Experimental Analysis and Characterization Using the Intel Optane SSD.
IEEE Access, 2023

Casper: Accelerating Stencil Computations Using Near-Cache Processing.
IEEE Access, 2023

High-Performance and Scalable Agent-Based Simulation with BioDynaMo.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Mappings.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TransPimLib: Efficient Transcendental Functions for Processing-in-Memory Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

RowPress: Amplifying Read Disturbance in Modern DRAM Chips.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Evaluating Homomorphic Operations on a Real-World Processing-In-Memory System.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation.
Proceedings of the 37th International Conference on Supercomputing, 2023

An Experimental Analysis of RowHammer in HBM2 DRAM Chips.
Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2023

Message from the DSN 2023 Program Chairs.
Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network, 2023

Invited: Accelerating Genome Analysis via Algorithm-Architecture Co-Design.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Lightning Talk: Memory-Centric Computing.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Fundamentally Understanding and Solving RowHammer.
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
pLUTo: Enabling Massively Parallel Computation In DRAM via Lookup Tables.
Dataset, July, 2022

Accelerating Weather Prediction Using Near-Memory Reconfigurable Fabric.
ACM Trans. Reconfigurable Technol. Syst., 2022

POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression.
IEEE Trans. Parallel Distributed Syst., 2022

Exploring Data Analytics Without Decompression on Embedded GPU Systems.
IEEE Trans. Parallel Distributed Syst., 2022

CoPA: Cold Page Awakening to Overcome Retention Failures in STT-MRAM Based I/O Buffers.
IEEE Trans. Parallel Distributed Syst., 2022

MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer Optimizations.
ACM Trans. Archit. Code Optim., 2022

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proc. ACM Meas. Anal. Comput. Syst., 2022

Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud.
IEEE Micro, 2022

Optically connected memory for disaggregated data centers.
J. Parallel Distributed Comput., 2022

Guest Editors' Introduction: Near-Memory and In-Memory Processing.
IEEE Des. Test, 2022

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering.
CoRR, 2022

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping.
CoRR, 2022

TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs.
CoRR, 2022

Taming Large-Scale Genomic Analyses via Sparsified Genomics.
CoRR, 2022

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators.
CoRR, 2022

Accelerating Time Series Analysis via Processing using Non-Volatile Memories.
CoRR, 2022

A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers.
CoRR, 2022

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory.
CoRR, 2022

SpyHammer: Using RowHammer to Remotely Spy on Temperature.
CoRR, 2022

LEAPER: Modeling Cloud FPGA-based Systems via Transfer Learning.
CoRR, 2022

Sectored DRAM: An Energy-Efficient High-Throughput and Practical Fine-Grained DRAM Architecture.
CoRR, 2022

A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations.
CoRR, 2022

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System.
CoRR, 2022

COVIDHunter: COVID-19 pandemic wave prediction and mitigation via seasonality-aware modeling.
CoRR, 2022

Going From Molecules to Genomic Variations to Scientific Discovery: Intelligent Algorithms and Architectures for Intelligent Genome Analysis.
CoRR, 2022

Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Cooperation.
CoRR, 2022

A Case for Transparent Reliability in DRAM Systems.
CoRR, 2022

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems.
CoRR, 2022

Packaging, containerization, and virtualization of computational omics methods: Advances, challenges, and opportunities.
CoRR, 2022

DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression.
CoRR, 2022

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis.
CoRR, 2022

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems.
CoRR, 2022

FastRemap: a tool for quickly remapping reads between genome assemblies.
Bioinform., 2022

BioDynaMo: a modular platform for high-performance agent-based simulation.
Bioinform., 2022

Demeter: A Fast and Energy-Efficient Food Profiler Using Hyperdimensional Computing in Memory.
IEEE Access, 2022

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System.
IEEE Access, 2022

Chapter Eight - The design of an energy-efficient deflection-based on-chip network.
Adv. Comput., 2022

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proceedings of the SIGMETRICS/PERFORMANCE '22: ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, Mumbai, India, June 6, 2022

ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

HiRA: Hidden Row Activation for Reducing Refresh Latency of Off-the-Shelf DRAM Chips.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

PiDRAM: An FPGA-based Framework for End-to-end Evaluation of Processing-in-DRAM Techniques.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

GenStore: In-Storage Filtering of Genomic Data for High-Performance and Energy-Efficient Genome Analysis.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

Machine Learning Training on a Real Processing-in-Memory System.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

SparseP: Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

Exploiting Near-Data Processing to Accelerate Time Series Analysis.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

SeGraM: a universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mapping.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Sibyl: adaptive and extensible data placement in hybrid storage systems using online reinforcement learning.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Algorithmic Improvement and GPU Acceleration of the GenASM Algorithm.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

DarkGates: A Hybrid Power-Gating Architecture to Mitigate the Performance Impact of Dark-Silicon in High Performance Processors.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

DR-STRaNGe: End-to-End System Design for DRAM-based True Random Number Generators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression.
Proceedings of the 20th USENIX Conference on File and Storage Technologies, 2022

Understanding RowHammer Under Reduced Wordline Voltage: An Experimental Study Using Real DRAM Devices.
Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2022

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

GenStore: a high-performance in-storage processing system for genome sequence analysis.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
TADOC: Text analytics directly on compression.
VLDB J., 2021

ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms.
IEEE Trans. Parallel Distributed Syst., 2021

Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories.
ACM Trans. Comput. Syst., 2021

Refresh Triggered Computation: Improving the Energy Efficiency of Convolutional Neural Network Accelerators.
ACM Trans. Archit. Code Optim., 2021

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.
Proc. VLDB Endow., 2021

FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications.
IEEE Micro, 2021

GenShare: Sharing Accurate Differentially-Private Statistics for Genomic Datasets with Dependent Tuples.
CoRR, 2021

Casper: Accelerating Stencil Computation using Near-cache Processing.
CoRR, 2021

Energy-Efficient Deflection-based On-chip Networks: Topology, Routing, Flow Control.
CoRR, 2021

Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study.
CoRR, 2021

A Deeper Look into RowHammer's Sensitivities: Experimental Analysis of Real DRAM Chips and Implications on Future Attacks and Defenses.
CoRR, 2021

NERO: Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric.
CoRR, 2021

Security Analysis of the Silver Bullet Technique for RowHammer Prevention.
CoRR, 2021

Near-Optimal Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding.
CoRR, 2021

SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM.
CoRR, 2021

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture.
CoRR, 2021

pLUTo: In-DRAM Lookup Tables to Enable Massively Parallel General-Purpose Computation.
CoRR, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.
CoRR, 2021

BurstLink: Techniques for Energy-Efficient Conventional and Virtual Reality Video Display.
CoRR, 2021

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.
CoRR, 2021

Polynesia: Enabling Effective Hybrid Transactional/Analytical Databases with Specialized Hardware/Software Co-Design.
CoRR, 2021

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models.
CoRR, 2021

COVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model.
CoRR, 2021

SneakySnake: a fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs.
Bioinform., 2021

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks.
IEEE Access, 2021

HARP: Practically and Effectively Identifying Uncorrectable Errors in Memory Chips That Use On-Die Error-Correcting Codes.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Uncovering In-DRAM RowHammer Protection Mechanisms: A New Methodology, Custom RowHammer Patterns, and Implications.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality Systems.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

A Deeper Look into RowHammer's Sensitivities: Experimental Analysis of Real DRAM Chipsand Implications on Future Attacks and Defenses.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Energy-Efficient Mobile Robot Control via Run-time Monitoring of Environmental Complexity and Computing Workload.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

BlockHammer: Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware.
Proceedings of the 12th International Green and Sustainable Computing Workshops, 2021

Modeling FPGA-Based Systems via Few-Shot Learning.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Intelligent Architectures for Intelligent Computing Systems.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Understanding Power Consumption and Reliability of High-Bandwidth Memory with Voltage Underscaling.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

SIMDRAM: a framework for bit-serial SIMD processing using DRAM.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Rethinking software runtimes for disaggregated memory.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Reducing solid-state drive read latency by optimizing read-retry.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Aging-Aware Request Scheduling for Non-Volatile Main Memory.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020
RowHammer: A Retrospective.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Accelerating Genome Analysis: A Primer on an Ongoing Journey.
IEEE Micro, 2020

Rethinking Divide and Conquer - Towards Holistic Interfaces of the Computing Stack.
IEEE Internet Comput., 2020

Guest Editorial: Robust Resource-Constrained Systems for Machine Learning.
IEEE Des. Test, 2020

Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead.
IEEE Des. Test, 2020

A Modern Primer on Processing in Memory.
CoRR, 2020

Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation.
CoRR, 2020

Intelligent Management of Mobile Systems through Computational Self-Awareness.
CoRR, 2020

BioDynaMo: an agent-based simulation platform for scalable computational biology research.
CoRR, 2020

Accelerating B-spline interpolation on GPUs: Application to medical image registration.
Comput. Methods Programs Biomed., 2020

NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories.
IEEE Comput. Archit. Lett., 2020

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm.
Bioinform., 2020

Intelligent Architectures for Intelligent Machines.
Proceedings of the 2020 International Symposium on VLSI Design, Automation and Test, 2020

TRRespass: Exploiting the Many Sides of Target Row Refresh.
Proceedings of the 2020 IEEE Symposium on Security and Privacy, 2020

Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers.
Proceedings of the 2020 IEEE Symposium on Security and Privacy, 2020

Optically Connected Memory for Disaggregated Data Centers.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

FlexWatts: A Power- and Workload-Aware Hybrid Power Delivery Network for Energy-Efficient Microprocessors.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Improving phase change memory performance with data content aware access.
Proceedings of the ISMM '20: 2020 ACM SIGPLAN International Symposium on Memory Management, 2020

CLR-DRAM: A Low-Cost DRAM Architecture Enabling Dynamic Capacity-Latency Trade-Off.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Revisiting RowHammer: An Experimental Analysis of Modern DRAM Devices and Mitigation Techniques.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

SysScale: Exploiting Multi-domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile Processors.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

The Non-IID Data Quagmire of Decentralized Machine Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Enabling Efficient Random Access to Hierarchically-Compressed Data.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

WoLFRaM: Enhancing Wear-Leveling and Fault Tolerance in Resistive Memories using Programmable Address Decoders.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

NATSA: A Near-Data Processing Accelerator for Time Series Analysis.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

Boyi: A Systematic Framework for Automatically Deciding the Right Execution Model of OpenCL Applications on FPGAs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration.
Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2020

Evanesco: Architectural Support for Efficient Data Sanitization in Modern Flash-Based Storage Systems.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Highly Concurrent Latency-tolerant Register Files for GPUs.
ACM Trans. Comput. Syst., 2019

Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories.
ACM Trans. Embed. Comput. Syst., 2019

An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories.
IEEE Trans. Computers, 2019

ITAP: Idle-Time-Aware Power Management for GPU Execution Units.
ACM Trans. Archit. Code Optim., 2019

AVPP: Address-first Value-next Predictor with Value Prefetching for Improving the Efficiency of Load Value Prediction.
ACM Trans. Archit. Code Optim., 2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning.
Proc. VLDB Endow., 2019

Demystifying Complex Workload-DRAM Interactions: An Experimental Study.
Proc. ACM Meas. Anal. Comput. Syst., 2019

Processing data where it makes sense: Enabling in-memory computation.
Microprocess. Microsystems, 2019

Processing-in-memory: A workload-driven perspective.
IBM J. Res. Dev., 2019

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes.
CoRR, 2019

A Workload and Programming Ease Driven Perspective of Processing-in-Memory.
CoRR, 2019

In-DRAM Bulk Bitwise Execution Engine.
CoRR, 2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report).
CoRR, 2019

Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study.
CoRR, 2019

Dataplant: In-DRAM Security Mechanisms for Low-Cost Devices.
CoRR, 2019

Shouji: a fast and efficient pre-alignment filter for sequence alignment.
Bioinform., 2019

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.
Briefings Bioinform., 2019

Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

Panthera: holistic memory management for big data processing over hybrid memories.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Binary Star: Coordinated Reliability in Heterogeneous Memory Systems for High Performance and Scalability.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

DSPatch: Dual Spatial Pattern Prefetcher.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

CROW: a low-cost substrate for improving DRAM performance, energy efficiency, and reliability.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

CoNDA: efficient cache coherence support for near-data accelerators.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

A Scalable Priority-Aware Approach to Managing Data Center Server Power.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Project PBerry: FPGA Acceleration for Remote Memory.
Proceedings of the Workshop on Hot Topics in Operating Systems, 2019

Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices.
Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Enabling Practical Processing in and near Memory for Data-Intensive Computing.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

RowHammer and Beyond.
Proceedings of the Constructive Side-Channel Analysis and Secure Design, 2019

Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

A Framework for Memory Oversubscription Management in Graphics Processing Units.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

Towards Breaking the Memory Bandwidth Wall Using Approximate Value Prediction.
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018
Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors.
ACM SIGOPS Oper. Syst. Rev., 2018

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights.
Proc. VLDB Endow., 2018

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation.
Proc. ACM Meas. Anal. Comput. Syst., 2018

What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study.
Proc. ACM Meas. Anal. Comput. Syst., 2018

ECI-Cache: A High-Endurance and Cost-Efficient I/O Caching Scheme for Virtualized Platforms.
Proc. ACM Meas. Anal. Comput. Syst., 2018

Iterative Modulo Scheduling.
IEEE Micro, 2018

Evaluating Row Buffer Locality in Future Non-Volatile Main Memories.
CoRR, 2018

Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions.
CoRR, 2018

SLIDER: Fast and Efficient Computation of Banded Sequence Alignment.
CoRR, 2018

D-RaNGe: Violating DRAM Timing Constraints for High-Throughput True Random Number Generation using Commodity DRAM Devices.
CoRR, 2018

Techniques for Efficiently Handling Power Surges in Fuel Cell Powered Data Centers: Modeling, Analysis, Results.
CoRR, 2018

Recent Advances in DRAM and Flash Memory Architectures.
CoRR, 2018

Recent Advances in Overcoming Bottlenecks in Memory Systems and Managing Memory Resources in GPU Systems.
CoRR, 2018

Predictable Performance and Fairness Through Accurate Slowdown Estimation in Shared Main Memory Systems.
CoRR, 2018

Exploiting Row-Level Temporal Locality in DRAM to Reduce the Memory Access Latency.
CoRR, 2018

RowClone: Accelerating Data Movement and Initialization Using DRAM.
CoRR, 2018

Characterizing, Exploiting, and Mitigating Vulnerabilities in MLC NAND Flash Memory Programming.
CoRR, 2018

Read Disturb Errors in MLC NAND Flash Memory.
CoRR, 2018

SoftMC: Practical DRAM Characterization Using an FPGA-Based Infrastructure.
CoRR, 2018

LISA: Increasing Internal Connectivity in DRAM for Fast Data Movement and Low Latency.
CoRR, 2018

Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency.
CoRR, 2018

Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips.
CoRR, 2018

Tiered-Latency DRAM: Enabling Low-Latency Main Memory at Low Cost.
CoRR, 2018

Adaptive-Latency DRAM: Reducing DRAM Latency by Exploiting Timing Margins.
CoRR, 2018

Experimental Characterization, Optimization, and Recovery of Data Retention Errors in MLC NAND Flash Memory.
CoRR, 2018

Decoupling GPU Programming Models from Resource Management for Enhanced Programming Ease, Portability, and Performance.
CoRR, 2018

Exploiting the DRAM Microarchitecture to Increase Memory-Level Parallelism.
CoRR, 2018

Reducing DRAM Refresh Overheads with Refresh-Access Parallelism.
CoRR, 2018

Mosaic: An Application-Transparent Hardware-Software Cooperative Memory Manager for GPUs.
CoRR, 2018

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems.
CoRR, 2018

A Memory Controller with Row Buffer Locality Awareness for Hybrid Memory Systems.
CoRR, 2018

Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance.
CoRR, 2018

Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management.
CoRR, 2018

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions.
CoRR, 2018

Focus: Querying Large Video Datasets with Low Latency and Low Cost.
CoRR, 2018

GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies.
BMC Genom., 2018

Focus: Querying Large Video Datasets with Low Latency and Low Cost.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Processing data where it makes sense in modern computing systems: Enabling in-memory computation.
Proceedings of the 7th Mediterranean Conference on Embedded Computing, 2018

A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality In GPUs.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

HICOMB Keynote 2.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

A Large Scale Study of Data Center Network Reliability.
Proceedings of the Internet Measurement Conference 2018, 2018

Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices.
Proceedings of the 16th USENIX Conference on File and Storage Technologies, 2018

VRL-DRAM: improving DRAM performance via variable refresh latency.
Proceedings of the 55th Annual Design Automation Conference, 2018

LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

SPECTR: Formal Supervisory Control and Coordination for Many-core Systems Resource Management.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.
Proc. ACM Meas. Anal. Comput. Syst., 2017

Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms.
Proc. ACM Meas. Anal. Comput. Syst., 2017

Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives.
Proc. IEEE, 2017

Improving the reliability of chip-off forensic analysis of NAND flash memory devices.
Digit. Investig., 2017

Improving DRAM Performance by Parallelizing Refreshes with Accesses.
CoRR, 2017

Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery.
CoRR, 2017

Improving Multi-Application Concurrency Support Within the GPU Memory System.
CoRR, 2017

Using ECC DRAM to Adaptively Increase Memory Capacity.
CoRR, 2017

Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency.
CoRR, 2017

Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms.
CoRR, 2017

LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures.
CoRR, 2017

A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM.
IEEE Comput. Archit. Lett., 2017

LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory.
IEEE Comput. Archit. Lett., 2017

GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.
Bioinform., 2017

Chapter Four - Simple Operations in Memory to Reduce Data Movement.
Adv. Comput., 2017

Concurrent Data Structures for Near-Memory Computing.
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds.
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Detecting and mitigating data-dependent DRAM failures by exploiting current memory content.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Carpool: a bufferless on-chip network supporting adaptive multicast and hotspot alleviation.
Proceedings of the International Conference on Supercomputing, 2017

SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

The RowHammer problem and other issues we may face as memory becomes denser.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Utility-Based Hybrid Memory Management.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling.
IEEE Trans. Parallel Distributed Syst., 2016

RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads.
ACM Trans. Archit. Code Optim., 2016

DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators.
ACM Trans. Archit. Code Optim., 2016

Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost.
ACM Trans. Archit. Code Optim., 2016

Bounding and reducing memory interference in COTS-based multi-core systems.
Real Time Syst., 2016

A case for hierarchical rings with deflection routing: An energy-efficient on-chip communication substrate.
Parallel Comput., 2016

The 2014 MICRO Test of Time Award Winners: From 1978 to 1992.
IEEE Micro, 2016

Common Bonds: MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor.
IEEE Micro, 2016

Enabling Accurate and Practical Online Flash Channel Modeling for Modern MLC NAND Flash Memory.
IEEE J. Sel. Areas Commun., 2016

Mitigating the Memory Bottleneck With Approximate Load Value Prediction.
IEEE Des. Test, 2016

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.
CoRR, 2016

The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR.
CoRR, 2016

Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM.
CoRR, 2016

Heterogeneous-Reliability Memory: Exploiting Application-Level Memory Error Tolerance.
CoRR, 2016

Tiered-Latency DRAM (TL-DRAM).
CoRR, 2016

Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips.
CoRR, 2016

Adaptive-Latency DRAM (AL-DRAM).
CoRR, 2016

RowHammer: Reliability Analysis and Security Implications.
CoRR, 2016

Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism.
CoRR, 2016

Reducing Performance Impact of DRAM Refresh by Parallelizing Refreshes with Accesses.
CoRR, 2016

Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing.
CoRR, 2016

GateKeeper: Enabling Fast Pre-Alignment in DNA Short Read Mapping with a New Streaming Accelerator Architecture.
CoRR, 2016

Ramulator: A Fast and Extensible DRAM Simulator.
IEEE Comput. Archit. Lett., 2016

Optimal seed solver: optimizing seed selection in read mapping.
Bioinform., 2016

Exploiting Core Criticality for Enhanced GPU Performance.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

Keynote: rethinking memory system design.
Proceedings of the 2016 International Symposium on Rapid System Prototyping, 2016

Yak: A High-Performance Big-Data-Friendly Garbage Collector.
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

NVMOVE: Helping Programmers Move to Byte-Based Persistence.
Proceedings of the 4th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, 2016

Zorua: A holistic approach to resource virtualization in GPUs.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Continuous runahead: Transparent hardware acceleration for memory intensive workloads.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Accelerating Dependent Cache Misses with an Enhanced Memory Controller.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

A model for Application Slowdown Estimation in on-chip networks and its use for improving system fairness and performance.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

A case for toggle-aware compression for GPU systems.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

SizeCap: Efficiently handling power surges in fuel cell powered data centers.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

ChargeCache: Reducing DRAM latency by exploiting row access locality.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2016

Invited - Who is the major threat to tomorrow's security?: you, the hardware designer.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

μC-States: Fine-grained GPU Datapath Power Management.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
High-Performance and Lightweight Transaction Support in Flash-Based SSDs.
IEEE Trans. Computers, 2015

Introducing the MICRO Test of Time Awards: Concept, Process, 2014 Winners, and the Future.
IEEE Micro, 2015

SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators.
CoRR, 2015

The Blacklisting Memory Scheduler: Balancing Performance, Fairness and Complexity.
CoRR, 2015

Managing Hybrid Main Memories with a Page-Utility Driven Performance Model.
CoRR, 2015

Simultaneous Multi Layer Access: A High Bandwidth and Low Cost 3D-Stacked Memory Interface.
CoRR, 2015

Fast Bulk Bitwise AND and OR in DRAM.
IEEE Comput. Archit. Lett., 2015

Toggle-Aware Compression for GPUs.
IEEE Comput. Archit. Lett., 2015

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.
Bioinform., 2015

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters.
Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2015

A Large-Scale Study of Flash Memory Failures in the Field.
Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2015

Rethinking memory system design for data-intensive computing.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips.
Proceedings of the 9th International Symposium on Networks-on-Chip, 2015

WARM: Improving NAND flash memory lifetime with write-hotness aware retention management.
Proceedings of the IEEE 31st Symposium on Mass Storage Systems and Technologies, 2015

Amnesic cache management for non-volatile memory.
Proceedings of the IEEE 31st Symposium on Mass Storage Systems and Technologies, 2015

The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

ThyNVM: enabling software-transparent crash consistency in persistent memory systems.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Rethinking Memory System Design (along with Interconnects).
Proceedings of the 8th International Workshop on Network on Chip Architectures, 2015

Comparative evaluation of FPGA and ASIC implementations of bufferless and buffered routing algorithms for on-chip networks.
Proceedings of the Sixteenth International Symposium on Quality Electronic Design, 2015

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A scalable processing-in-memory accelerator for parallel graph processing.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Exploiting compressed block size as an indicator of future reuse.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Data retention in MLC NAND flash memory: Characterization, optimization, and recovery.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories.
ACM Trans. Archit. Code Optim., 2014

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks.
ACM Trans. Archit. Code Optim., 2014

Research Problems and Opportunities in Memory Systems.
Supercomput. Front. Innov., 2014

The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

Neighbor-cell assisted error correction for MLC NAND flash memories.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

Design and Evaluation of Hierarchical Rings with Deflection Routing.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Bounding memory interference delay in COTS-based multi-core systems.
Proceedings of the 20th IEEE Real-Time and Embedded Technology and Applications Symposium, 2014

FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Managing GPU Concurrency in Heterogeneous Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

The Dirty-Block Index.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

The Blacklisting Memory Scheduler: Achieving high performance and fairness at low cost.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Loose-Ordering Consistency for persistent memory.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

The heterogeneous block architecture.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Improving cache performance using read-write partitioning.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Improving DRAM performance by parallelizing refreshes with accesses.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014

Rollback-free value prediction with approximate loads.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

Warp-aware trace scheduling for GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

Memory Systems.
Proceedings of the Computing Handbook, 2014

2013
Accelerating read mapping with FastHASH.
BMC Genom., 2013

RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Linearly compressed pages: a low-complexity, low-latency main memory compression framework.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

EMERALD: Characterization of emerging applications and algorithms for low-power devices.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Evaluating STT-RAM as an energy-efficient main memory alternative.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Orchestrated scheduling and prefetching for GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Utility-based acceleration of multithreaded applications on asymmetric CMPs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

MISE: Providing performance predictability and improving fairness in shared main memory systems.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Tiered-latency DRAM: A low latency and low cost DRAM architecture.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Application-to-core mapping policies to reduce memory system interference in multi-core systems.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Threshold voltage distribution in MLC NAND flash memory: characterization, analysis, and modeling.
Proceedings of the Design, Automation and Test in Europe, 2013

A heterogeneous multiple network-on-chip design: an application-aware approach.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems.
ACM Trans. Comput. Syst., 2012

A QoS-Enabled On-Die Interconnect Fabric for Kilo-Node Chips.
IEEE Micro, 2012

Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management.
IEEE Comput. Archit. Lett., 2012

On-chip networks from a networking perspective: congestion and scalability in many-core interconnects.
Proceedings of the ACM SIGCOMM 2012 Conference, 2012

HAT: Heterogeneous Adaptive Throttling for On-Chip Networks.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect.
Proceedings of the 2012 Sixth IEEE/ACM International Symposium on Networks-on-Chip (NoCS), 2012

RAIDR: Retention-aware intelligent DRAM refresh.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

A case for exploiting subarray-level parallelism (SALP) in DRAM.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Row buffer locality aware caching policies for hybrid memories.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

A case for small row buffers in non-volatile main memories.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Bottleneck identification and scheduling in multithreaded applications.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Base-delta-immediate compression: practical data compression for on-chip caches.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Linearly compressed pages: a main memory compression framework with low complexity and low latency.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Application-aware prefetch prioritization in on-chip networks.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Application-to-core mapping policies to reduce memory interference in multi-core systems.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Prefetch-Aware Memory Controllers.
IEEE Trans. Computers, 2011

Data Marshaling for Multicore Systems.
IEEE Micro, 2011

Top Picks [Guest editors' introduction].
IEEE Micro, 2011

Thread Cluster Memory Scheduling.
IEEE Micro, 2011

Aérgia: A Network-on-Chip Exploiting Packet Latency Slack.
IEEE Micro, 2011

FIST: A fast, lightweight, FPGA-friendly packet latency estimator for NoC modeling in full-system simulations.
Proceedings of the NOCS 2011, 2011

Improving GPU performance via large warps and two-level warp scheduling.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Reducing memory interference in multicore systems via application-aware memory channel partitioning.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Parallel application memory scheduling.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Memory systems in the many-core era: challenges, opportunities, and solution directions.
Proceedings of the 10th International Symposium on Memory Management, 2011

Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Prefetch-aware shared resource management for multi-core systems.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Memory power management via dynamic voltage/frequency scaling.
Proceedings of the 8th International Conference on Autonomic Computing, 2011

CHIPPER: A low-complexity bufferless deflection router.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010
Accelerating Critical Section Execution with Asymmetric Multicore Architectures.
IEEE Micro, 2010

Phase-Change Technology and the Future of Main Memory.
IEEE Micro, 2010

Phase change memory architecture and the quest for scalability.
Commun. ACM, 2010

Concurrent autonomous self-test for uncore components in system-on-chips.
Proceedings of the 28th IEEE VLSI Test Symposium, 2010

QuaLe: A Quantum-Leap Inspired Model for Non-stationary Analysis of NoC Traffic in Chip Multi-processors.
Proceedings of the NOCS 2010, 2010

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Data marshaling for multi-core architectures.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Topology-Aware Quality-of-Service Support in Highly Integrated Chip Multiprocessors.
Proceedings of the Computer Architecture, 2010

Aérgia: exploiting packet latency slack in on-chip networks.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Next generation on-chip networks: what kind of congestion control do we need?
Proceedings of the 9th ACM Workshop on Hot Topics in Networks. HotNets 2010, Monterey, CA, USA - October 20, 2010

Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Efficient runahead threads.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Virtual Program Counter (VPC) Prediction: Very Low Cost Indirect Branch Prediction Using Conditional Branch Prediction Hardware.
IEEE Trans. Computers, 2009

A Flexible Software-Based Framework for Online Detection of Hardware Defects.
IEEE Trans. Computers, 2009

Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Shared Memory Controllers.
IEEE Micro, 2009

Improving memory bank-level parallelism in the presence of prefetching.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Coordinated control of multiple prefetchers in multi-core systems.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Application-aware prioritization mechanisms for on-chip networks.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

A case for bufferless routing in on-chip networks.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Architecting phase change memory as a scalable dram alternative.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Flexible reference-counting-based hardware acceleration for garbage collection.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Operating system scheduling for efficient online self-test in robust systems.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

Express Cube Topologies for on-Chip Interconnects.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Accelerating critical section execution with asymmetric multi-core architectures.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008
Guest Editors' Introduction: Interaction of Many-Core Computer Architecture and Operating Systems.
IEEE Micro, 2008

Dynamic Predication of Indirect Jumps.
IEEE Comput. Archit. Lett., 2008

Distributed order scheduling and its application to multi-core dram controllers.
Proceedings of the Twenty-Seventh Annual ACM Symposium on Principles of Distributed Computing, 2008

Prefetch-Aware DRAM Controllers.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Online design bug detection: RTL analysis, flexible mechanisms, and evaluation.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Performance-aware speculation control using wrong path usefulness prediction.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Improving the performance of object-oriented languages with dynamic predication of indirect jumps.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

2007
Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication.
IEEE Micro, 2007

Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems.
Proceedings of the 16th USENIX Security Symposium, Boston, MA, USA, August 6-10, 2007, 2007

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006
Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses.
IEEE Trans. Computers, 2006

Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance.
IEEE Micro, 2006

Wish Branches: Enabling Adaptive and Aggressive Predicated Execution.
IEEE Micro, 2006

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

A Case for MLP-Aware Cache Replacement.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

2005
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors.
IEEE Trans. Computers, 2005

Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References.
Int. J. Parallel Program., 2005

On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor.
IEEE Comput. Archit. Lett., 2005

Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Techniques for Efficient Processing in Runahead Execution Engines.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors.
Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN 2005), 28 June, 2005

2004
Understanding the effects of wrong-path memory references on processor performance.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Cache Filtering Techniques to Reduce the Negative Impact of Useless Speculative Memory References on Processor Performance.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

2003
Runahead Execution: An Effective Alternative to Large Instruction Windows.
IEEE Micro, 2003

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003


  Loading...