Pejman Lotfi-Kamran

Orcid: 0000-0003-3293-8274

According to our database1, Pejman Lotfi-Kamran authored at least 57 papers between 2005 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
MANA: Microarchitecting a Temporal Instruction Prefetcher.
IEEE Trans. Computers, March, 2023

Rei: A Reconfigurable Interconnection Unit for Array-Based CNN Accelerators.
IEEE Trans. Emerg. Top. Comput., 2023

Snake: A Variable-length Chain-based Prefetching for GPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

2022
RASHT: A Partially Reconfigurable Architecture for Efficient Implementation of CNNs.
IEEE Trans. Very Large Scale Integr. Syst., 2022

OSM: Off-Chip Shared Memory for GPUs.
IEEE Trans. Parallel Distributed Syst., 2022

Chapter Six - Evaluation of data prefetchers.
Adv. Comput., 2022

Chapter Five - State-of-the-art data prefetchers.
Adv. Comput., 2022

Chapter Four - Beyond spatial or temporal prefetching.
Adv. Comput., 2022

Chapter Three - Temporal prefetching.
Adv. Comput., 2022

Chapter Two - Spatial prefetching.
Adv. Comput., 2022

Chapter One - Introduction to data prefetching.
Adv. Comput., 2022

Preface.
Adv. Comput., 2022

2021
MANA: Microarchitecting an Instruction Prefetcher.
CoRR, 2021

Data-Aware Compression of Neural Networks.
IEEE Comput. Archit. Lett., 2021

2020
A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers.
CoRR, 2020

Harnessing Pairwise-Correlating Data Prefetching With Runahead Metadata.
IEEE Comput. Archit. Lett., 2020

Divide and Conquer Frontend Bottleneck.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019
Reducing Writebacks Through In-Cache Displacement.
ACM Trans. Design Autom. Electr. Syst., 2019

Evaluation of Hardware Data Prefetchers on Server Processors.
ACM Comput. Surv., 2019

Code Layout Optimization for Near-Ideal Instruction Cache.
IEEE Comput. Archit. Lett., 2019

Bingo Spatial Data Prefetcher.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2018
Fast Data Delivery for Many-Core Processors.
IEEE Trans. Computers, 2018

ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning.
CoRR, 2018

Die-Stacked DRAM: Memory, Cache, or MemCache?
CoRR, 2018

Making Belady-Inspired Replacement Policies More Effective Using Expected Hit Count.
CoRR, 2018

Scale-Out Processors & Energy Efficiency.
CoRR, 2018

Cache Replacement Policy Based on Expected Hit Count.
IEEE Comput. Archit. Lett., 2018

Chapter One - Dark Silicon and the History of Computing.
Adv. Comput., 2018

Domino Temporal Data Prefetcher.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

In-DRAM near-data approximate acceleration for GPUs.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
AxBench: A Multiplatform Benchmark Suite for Approximate Computing.
IEEE Des. Test, 2017

An Efficient Temporal Data Prefetcher for L1 Caches.
IEEE Comput. Archit. Lett., 2017

Near-Ideal Networks-on-Chip for Servers.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016
An Efficient Hybrid-Switched Network-on-Chip for Chip Multiprocessors.
IEEE Trans. Computers, 2016

Dynamic Resource Sharing for High-Performance 3-D Networks-on-Chip.
IEEE Comput. Archit. Lett., 2016

2015
Per-packet global congestion estimation for fast packet delivery in networks-on-chip.
J. Supercomput., 2015

Neural acceleration for GPU throughput processors.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

2013
Scale-Out Processors.
PhD thesis, 2013

2012
Optimizing Data-Center TCO with Scale-Out Processors.
IEEE Micro, 2012

NOC-Out: Microarchitecting a Scale-Out Processor.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Scale-out processors.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Thermal characterization of cloud workloads on a power-efficient server-on-chip.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

2011
Single-Event Transient Analysis in High Speed Circuits.
Proceedings of the International Symposium on Electronic System Design, 2011

Cuckoo directory: A scalable directory for many-core systems.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010
EDXY - A low cost congestion-aware routing algorithm for network-on-chips.
J. Syst. Archit., 2010

TurboTag: lookup filtering to reduce coherence directory power.
Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

2009
Negative Exponential Distribution Traffic Pattern for Power/Performance Analysis of Network on Chips.
Proceedings of the VLSI Design 2009: Improving Productivity through Higher Abstraction, 2009

2008
Stall Power Reduction in Pipelined Architecture Processors.
Proceedings of the 21st International Conference on VLSI Design (VLSI Design 2008), 2008

Enhanced TED: A New Data Structure for RTL Verification.
Proceedings of the 21st International Conference on VLSI Design (VLSI Design 2008), 2008

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs.
Proceedings of the Design, Automation and Test in Europe, 2008

2007
Low test application time resource binding for behavioral synthesis.
ACM Trans. Design Autom. Electr. Syst., 2007

Low overhead DFT using CDFG by modifying controller.
IET Comput. Digit. Tech., 2007

A UML Based System Level Failure Rate Assessment Technique for SoC Designs.
Proceedings of the 25th IEEE VLSI Test Symposium (VTS 2007), 2007

2006
Single-Event Upset Analysis and Protection in High Speed Circuits.
Proceedings of the 11th European Test Symposium, 2006

2005
A Flow Graph Technique for DFT Controller Modification.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

Binary Taylor diagrams: an efficient implementation of Taylor expansion diagrams.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

TED+: a data structure for microprocessor verification.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005


  Loading...