Rachata Ausavarungnirun

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Energy-Efficient Deflection-based On-chip Networks: Topology, Routing, Flow Control.

[BibT_eX]

[DOI]

CoRR, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.

[BibT_eX]

[DOI]

Maciej Besta

Raghavendra Kanakagiri

Grzegorz Kwasniewski

Konstantinos Kanellopoulos

Jakub Beránek

Kacper Janda

Zur Vonarburg-Shmaria

Salvatore Di Girolamo

Marek Konieczny

Torsten Hoefler

CoRR, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.

[BibT_eX]

[DOI]

Maciej Besta

Raghavendra Kanakagiri

Grzegorz Kwasniewski

Konstantinos Kanellopoulos

Jakub Beránek

Kacper Janda

Zur Vonarburg-Shmaria

Salvatore Di Girolamo

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

FPRA: A Fine-grained Parallel RRAM Architecture.

[BibT_eX]

[DOI]

Xiao Liu

Minxuan Zhou

Sean Eilert

Ameen Akel

Tajana Rosing

Vijaykrishnan Narayanan

Jishen Zhao

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2021

Improving Inter-kernel Data Reuse With CTA-Page Coordination in GPGPU.

[BibT_eX]

[DOI]

Xuanyi Li

Chen Li

Yang Guo

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

2020

A Modern Primer on Processing in Memory.

[BibT_eX]

[DOI]

Juan Gómez-Luna

CoRR, 2020

Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation.

[BibT_eX]

[DOI]

Amirhossein Mirhosseini

Seyyed Hossein Seyyedaghaei Rezaei

CoRR, 2020

NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories.

[BibT_eX]

[DOI]

Mehdi Modarressi

Masoud Daneshtalab

IEEE Comput. Archit. Lett., 2020

Acclaim: Adaptive Memory Reclaim to Improve User Experience in Android Systems.

[BibT_eX]

[DOI]

Yu Liang

Jinheng Li

Proceedings of the 2020 USENIX Annual Technical Conference, 2020

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis.

[BibT_eX]

[DOI]

Konstantinos Kanellopoulos

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework.

[BibT_eX]

[DOI]

Nastaran Hajinazar

Pratyush Patel

Minesh Patel

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Differentiating Cache Files for Fine-grain Management to Improve Mobile Performance and Lifetime.

[BibT_eX]

[DOI]

Yu Liang

Jinheng Li

Xianzhang Chen

Riwei Pan

Tei-Wei Kuo

Chun Jason Xue

Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems, 2020

PRISM: Architectural Support for Variable-granularity Memory Metadata.

[BibT_eX]

[DOI]

Timothy Merrifield

Jayneel Gandhi

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Highly Concurrent Latency-tolerant Register Files for GPUs.

[BibT_eX]

[DOI]

Amirhossein Mirhosseini

ACM Trans. Comput. Syst., 2019

ITAP: Idle-Time-Aware Power Management for GPU Execution Units.

[BibT_eX]

[DOI]

Seyed Borna Ehsani

Hajar Falahati

ACM Trans. Archit. Code Optim., 2019

Processing data where it makes sense: Enabling in-memory computation.

[BibT_eX]

[DOI]

Juan Gómez-Luna

Microprocess. Microsystems, 2019

Binary Star: Coordinated Reliability in Heterogeneous Memory Systems for High Performance and Scalability.

[BibT_eX]

[DOI]

Xiao Liu

David Roberts

Jishen Zhao

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

CoNDA: efficient cache coherence support for near-data accelerators.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Enabling Practical Processing in and near Memory for Data-Intensive Computing.

[BibT_eX]

[DOI]

Juan Gómez-Luna

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

A Framework for Memory Oversubscription Management in Graphics Processing Units.

[BibT_eX]

[DOI]

Chen Li

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., 2018

Techniques for Efficiently Handling Power Surges in Fuel Cell Powered Data Centers: Modeling, Analysis, Results.

[BibT_eX]

[DOI]

CoRR, 2018

Recent Advances in DRAM and Flash Memory Architectures.

[BibT_eX]

[DOI]

CoRR, 2018

Recent Advances in Overcoming Bottlenecks in Memory Systems and Managing Memory Resources in GPU Systems.

[BibT_eX]

[DOI]

CoRR, 2018

RowClone: Accelerating Data Movement and Initialization Using DRAM.

[BibT_eX]

[DOI]

CoRR, 2018

Mosaic: An Application-Transparent Hardware-Software Cooperative Memory Manager for GPUs.

[BibT_eX]

[DOI]

CoRR, 2018

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems.

[BibT_eX]

[DOI]

CoRR, 2018

A Memory Controller with Row Buffer Locality Awareness for Hybrid Memory Systems.

[BibT_eX]

[DOI]

HanBin Yoon

Justin Meza

Rachael A. Harding

CoRR, 2018

Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance.

[BibT_eX]

[DOI]

CoRR, 2018

Techniques for Shared Resource Management in Systems with Throughput Processors.

[BibT_eX]

[DOI]

CoRR, 2018

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions.

[BibT_eX]

[DOI]

Kevin Hsieh

Amirali Boroumand

CoRR, 2018

LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching.

[BibT_eX]

[DOI]

Amirhossein Mirhosseini

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks.

[BibT_eX]

[DOI]

Amirali Boroumand

Youngsok Kim

Parthasarathy Ranganathan

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability.

[BibT_eX]

[DOI]

Maciej Besta

Syed Minhaj Hassan

Sudhakar Yalamanchili

Torsten Hoefler

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017

Techniques for Shared Resource Management in Systems with Throughput Processors.

[BibT_eX]

[DOI]

PhD thesis, 2017

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.

[BibT_eX]

[DOI]

Gennady Pekhimenko

Vivek Seshadri

Proc. ACM Meas. Anal. Comput. Syst., 2017

Improving Multi-Application Concurrency Support Within the GPU Memory System.

[BibT_eX]

[DOI]

CoRR, 2017

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016

A case for hierarchical rings with deflection routing: An energy-efficient on-chip communication substrate.

[BibT_eX]

[DOI]

Parallel Comput., 2016

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.

[BibT_eX]

[DOI]

CoRR, 2016

Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips.

[BibT_eX]

[DOI]

Donghyuk Lee

Samira Manabi Khan

Lavanya Subramanian

CoRR, 2016

Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing.

[BibT_eX]

[DOI]

CoRR, 2016

SizeCap: Efficiently handling power surges in fuel cell powered data centers.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

μC-States: Fine-grained GPU Datapath Power Management.

[BibT_eX]

[DOI]

Onur Kayiran

Adwait Jog

Ashutosh Pattnaik

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips.

[BibT_eX]

[DOI]

Mohammad Fattah

Antti Airola

Proceedings of the 9th International Symposium on Networks-on-Chip, 2015

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM.

[BibT_eX]

[DOI]

Donghyuk Lee

Lavanya Subramanian

Jongmoo Choi

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Design and Evaluation of Hierarchical Rings with Deflection Routing.

[BibT_eX]

[DOI]

Nachiappan Chidambaram Nachiappan

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Managing GPU Concurrency in Heterogeneous Architectures.

[BibT_eX]

[DOI]

Onur Kayiran

Adwait Jog

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

2013

RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Application-to-core mapping policies to reduce memory system interference in multi-core systems.

[BibT_eX]

[DOI]

Reetuparna Das

Akhilesh Kumar

Mani Azimi

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012

HAT: Heterogeneous Adaptive Throttling for On-Chip Networks.

[BibT_eX]

[DOI]

Kevin Kai-Wei Chang

Chris Fallin

Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect.

[BibT_eX]

[DOI]

Proceedings of the 2012 Sixth IEEE/ACM International Symposium on Networks-on-Chip (NoCS), 2012

Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Row buffer locality aware caching policies for hybrid memories.

[BibT_eX]

[DOI]

HanBin Yoon

Justin Meza

Rachael Harding

Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Application-to-core mapping policies to reduce memory interference in multi-core systems.

[BibT_eX]

[DOI]

Reetuparna Das