Pejman Lotfi-Kamran

Ehsan Yousefzadeh-Asl-Miandoab

IEEE Trans. Emerg. Top. Comput., 2023

Snake: A Variable-length Chain-based Prefetching for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

2022

RASHT: A Partially Reconfigurable Architecture for Efficient Implementation of CNNs.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2022

OSM: Off-Chip Shared Memory for GPUs.

[BibT_eX]

[DOI]

Sina Darabi

IEEE Trans. Parallel Distributed Syst., 2022

Chapter Six - Evaluation of data prefetchers.

[BibT_eX]

[DOI]

Adv. Comput., 2022

Chapter Five - State-of-the-art data prefetchers.

[BibT_eX]

[DOI]

Adv. Comput., 2022

Chapter Four - Beyond spatial or temporal prefetching.

[BibT_eX]

[DOI]

Adv. Comput., 2022

Chapter Three - Temporal prefetching.

[BibT_eX]

[DOI]

Adv. Comput., 2022

Chapter Two - Spatial prefetching.

[BibT_eX]

[DOI]

Adv. Comput., 2022

Chapter One - Introduction to data prefetching.

[BibT_eX]

[DOI]

Adv. Comput., 2022

Preface.

[BibT_eX]

[DOI]

Adv. Comput., 2022

2021

MANA: Microarchitecting an Instruction Prefetcher.

[BibT_eX]

[DOI]

CoRR, 2021

Data-Aware Compression of Neural Networks.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2021

2020

A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers.

[BibT_eX]

[DOI]

CoRR, 2020

Harnessing Pairwise-Correlating Data Prefetching With Runahead Metadata.

[BibT_eX]

[DOI]

Fatemeh Golshan

IEEE Comput. Archit. Lett., 2020

Divide and Conquer Frontend Bottleneck.

[BibT_eX]

[DOI]

Ali Ansari

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019

Reducing Writebacks Through In-Cache Displacement.

[BibT_eX]

[DOI]

Seyed Armin Vakil-Ghahani

Aydin Faraji

Farid Samandi

ACM Trans. Design Autom. Electr. Syst., 2019

Evaluation of Hardware Data Prefetchers on Server Processors.

[BibT_eX]

[DOI]

Seyedali Tabaeiaghdaei

ACM Comput. Surv., 2019

Code Layout Optimization for Near-Ideal Instruction Cache.

[BibT_eX]

[DOI]

Ali Ansari

IEEE Comput. Archit. Lett., 2019

Bingo Spatial Data Prefetcher.

[BibT_eX]

[DOI]

Mehran Shakerinava

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2018

Fast Data Delivery for Many-Core Processors.

[BibT_eX]

[DOI]

Abbas Mazloumi

Farid Samandi

Mahmood Naderan-Tahan

IEEE Trans. Computers, 2018

ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Die-Stacked DRAM: Memory, Cache, or MemCache?

[BibT_eX]

[DOI]

HamidReza Zare

Seyed Armin Vakil-Ghahani

CoRR, 2018

Making Belady-Inspired Replacement Policies More Effective Using Expected Hit Count.

[BibT_eX]

[DOI]

Sara Mahdizadeh-Shahri

CoRR, 2018

Scale-Out Processors & Energy Efficiency.

[BibT_eX]

[DOI]

Pouya Esmaili-Dokht

Behnam Khodabandeloo

Mohammad-Reza Lotfi-Namin

CoRR, 2018

Cache Replacement Policy Based on Expected Hit Count.

[BibT_eX]

[DOI]

Armin Vakil-Ghahani

Sara Mahdizadeh-Shahri

IEEE Comput. Archit. Lett., 2018

Chapter One - Dark Silicon and the History of Computing.

[BibT_eX]

[DOI]

Adv. Comput., 2018

Domino Temporal Data Prefetcher.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

In-DRAM near-data approximate acceleration for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

AxBench: A Multiplatform Benchmark Suite for Approximate Computing.

[BibT_eX]

[DOI]

IEEE Des. Test, 2017

An Efficient Temporal Data Prefetcher for L1 Caches.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

Near-Ideal Networks-on-Chip for Servers.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016

An Efficient Hybrid-Switched Network-on-Chip for Chip Multiprocessors.

[BibT_eX]

[DOI]

Seyyed Hossein Seyyedaghaei Rezaei

IEEE Trans. Computers, 2016

Dynamic Resource Sharing for High-Performance 3-D Networks-on-Chip.

[BibT_eX]

[DOI]

Abbas Mazloumi

IEEE Comput. Archit. Lett., 2016

2015

Per-packet global congestion estimation for fast packet delivery in networks-on-chip.

[BibT_eX]

[DOI]

J. Supercomput., 2015

Neural acceleration for GPU throughput processors.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

2013

Scale-Out Processors.

[BibT_eX]

[DOI]

PhD thesis, 2013

2012

Optimizing Data-Center TCO with Scale-Out Processors.

[BibT_eX]

[DOI]

Chrysostomos Nicopoulos

Yiannakis Sazeides

IEEE Micro, 2012

NOC-Out: Microarchitecting a Scale-Out Processor.

[BibT_eX]

[DOI]

Boris Grot

Babak Falsafi

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Scale-out processors.

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Thermal characterization of cloud workloads on a power-efficient server-on-chip.

[BibT_eX]

[DOI]

Chrysostomos Nicopoulos

Damien Hardy

Babak Falsafi

Yiannakis Sazeides

Proceedings of the 30th International IEEE Conference on Computer Design, 2012

2011

Single-Event Transient Analysis in High Speed Circuits.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Electronic System Design, 2011

Cuckoo directory: A scalable directory for many-core systems.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010

EDXY - A low cost congestion-aware routing algorithm for network-on-chips.

[BibT_eX]

[DOI]

Amir-Mohammad Rahmani

Masoud Daneshtalab

Ali Afzali-Kusha

J. Syst. Archit., 2010

TurboTag: lookup filtering to reduce coherence directory power.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

2009

Negative Exponential Distribution Traffic Pattern for Power/Performance Analysis of Network on Chips.

[BibT_eX]

[DOI]

Amir-Mohammad Rahmani

Proceedings of the VLSI Design 2009: Improving Productivity through Higher Abstraction, 2009

2008

Stall Power Reduction in Pipelined Architecture Processors.

[BibT_eX]

[DOI]

Amir-Mohammad Rahmani

Ali-Asghar Salehpour

Ali Afzali-Kusha

Proceedings of the 21st International Conference on VLSI Design (VLSI Design 2008), 2008

Enhanced TED: A New Data Structure for RTL Verification.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on VLSI Design (VLSI Design 2008), 2008

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2008

2007

Low test application time resource binding for behavioral synthesis.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2007

Low overhead DFT using CDFG by modifying controller.

[BibT_eX]

[DOI]

Fabrizio Lombardi

IET Comput. Digit. Tech., 2007

A UML Based System Level Failure Rate Assessment Technique for SoC Designs.

[BibT_eX]

[DOI]

Mohammad Hossein Neishaburi

Proceedings of the 25th IEEE VLSI Test Symposium (VTS 2007), 2007

2006

Single-Event Upset Analysis and Protection in High Speed Circuits.

[BibT_eX]

[DOI]

Proceedings of the 11th European Test Symposium, 2006

2005

A Flow Graph Technique for DFT Controller Modification.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

Binary Taylor diagrams: an efficient implementation of Taylor expansion diagrams.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

TED+: a data structure for microprocessor verification.

[BibT_eX]

[DOI]

Hamid Shojaei

Mehran Massoumi