Vijayalakshmi Srinivasan

Wei Wang

Moriyoshi Ohara

Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2019

2018

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN).

[BibT_eX]

[DOI]

Pierce I-Jen Chuang

Zhuo Wang

CoRR, 2018

PACT: Parameterized Clipping Activation for Quantized Neural Networks.

[BibT_eX]

[DOI]

Zhuo Wang

Pierce I-Jen Chuang

CoRR, 2018

A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

Taming the beast: Programming Peta-FLOP class Deep Learning Systems.

[BibT_eX]

[DOI]

Leland Chang

Proceedings of the International Symposium on Low Power Electronics and Design, 2018

Across the Stack Opportunities for Deep Learning Acceleration.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design, 2018

Exploiting approximate computing for deep learning acceleration.

[BibT_eX]

[DOI]

Chia-Yu Chen

Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors.

[BibT_eX]

[DOI]

Shubham Jain

Pierce Chuang

Leland Chang

Proceedings of the 55th Annual Design Automation Conference, 2018

2017

Special Issue on Network and Parallel Computing.

[BibT_eX]

[DOI]

Yunquan Zhang

Int. J. Parallel Program., 2017

Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs.

[BibT_eX]

[DOI]

Nick Sumner

Steve Margerm

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Accelerator Design for Deep Learning Training: Extended Abstract: Invited.

[BibT_eX]

[DOI]

Ankur Agrawal

Chia-Yu Chen

Jinwook Oh

Sunil Shukla

Wei Zhang

Proceedings of the 54th Annual Design Automation Conference, 2017

POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators.

[BibT_eX]

[DOI]

Leland Chang

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Co-designing accelerators and SoC interfaces using gem5-Aladdin.

[BibT_eX]

[DOI]

Yakun Sophia Shao

Sam Likun Xi

Gu-Yeon Wei

David M. Brooks

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Peruse and Profit: Estimating the Accelerability of Loops.

[BibT_eX]

[DOI]

Amirali Sharifian

Nick Sumner

Proceedings of the 2016 International Conference on Supercomputing, 2016

Approximate computing: Challenges and opportunities.

[BibT_eX]

[DOI]

Ankur Agrawal

Zehra Sura

Proceedings of the IEEE International Conference on Rebooting Computing, 2016

2015

Self-contained, accurate precomputation prefetching.

[BibT_eX]

[DOI]

Islam Atta

Xin Tong

Ioana Baldini

Proceedings of the 48th International Symposium on Microarchitecture, 2015

DASX: Hardware Accelerator for Software Data Structures.

[BibT_eX]

[DOI]

Naveen Vedula

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2014

Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads.

[BibT_eX]

[DOI]

Seth H. Pugsley

Jeffrey Jestes

Alper Buyuktosunoglu

Al Davis

Feifei Li

IEEE Micro, 2014

NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads.

[BibT_eX]

[DOI]

Seth H. Pugsley

Jeffrey Jestes

Huihui Zhang

Alper Buyuktosunoglu

Al Davis

Feifei Li

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

SQRL: hardware accelerator for collecting software data structures.

[BibT_eX]

[DOI]

Dan Lin

Jordon Phillips

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

RECAP: A region-based cure for the common cold (cache).

[BibT_eX]

[DOI]

Jason Zebchuk

Harold W. Cain

Xin Tong

Lakshminarayanan Renganarayana

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012

Programming with relaxed synchronization.

[BibT_eX]

[DOI]

Ravi Nair

Daniel A. Prener

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability, 2012

Efficient scrub mechanisms for error-prone emerging memories.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

ReCaP: a region-based cure for the common cold cache.

[BibT_eX]

[DOI]

Jason Zebchuk

Harold W. Cain

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Big Chips.

[BibT_eX]

[DOI]

Andrew B. Kahng

IEEE Micro, 2011

SPATL: Honey, I Shrunk the Coherence Directory.

[BibT_eX]

[DOI]

Hongzhou Zhao

Sandhya Dwarkadas

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

SAFER: Stuck-At-Fault Error Recovery for Memories.

[BibT_eX]

[DOI]

Nak Hee Seong

Dong Hyuk Woo

Hsien-Hsin S. Lee

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

2009

A tagless coherence directory.

[BibT_eX]

[DOI]

Jason Zebchuk

Moinuddin K. Qureshi

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling.

[BibT_eX]

[DOI]

Moinuddin K. Qureshi

John P. Karidis

Michele Franceschini

Luis A. Lastras

Bülent Abali

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Scalable high performance main memory system using phase-change memory technology.

[BibT_eX]

[DOI]

Moinuddin K. Qureshi

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

2008

Analyzing the Cost of a Cache Miss Using Pipeline Spectroscopy.

[BibT_eX]

[DOI]

Thomas R. Puzak

Allan Hartstein

Philip G. Emma

Arthur Nadas

J. Instr. Level Parallelism, 2008

On the Nature of Cache Miss Behavior: Is It √2?

[BibT_eX]

[DOI]

Allan Hartstein

Thomas R. Puzak

Philip G. Emma

J. Instr. Level Parallelism, 2008

2007

Pipeline spectroscopy.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Experimental Computer Science, 2007

An analysis of the effects of miss clustering on the cost of a cache miss.

[BibT_eX]

[DOI]

Proceedings of the 4th Conference on Computing Frontiers, 2007

2006

Cache miss behavior: is it sqrt(2)?

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Computing Frontiers, 2006

2005

Exploring the limits of prefetching.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2005

When prefetching improves/degrades performance.

[BibT_eX]

[DOI]

Proceedings of the Second Conference on Computing Frontiers, 2005

2004

Integrated Analysis of Power and Performance for Pipelined Microprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2004

A Prefetch Taxonomy.

[BibT_eX]

[DOI]

Edward S. Davidson

Gary S. Tyson

IEEE Trans. Computers, 2004

Microarchitectural techniques for power gating of execution units.

[BibT_eX]

[DOI]

Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

2003

New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors.

[BibT_eX]

[DOI]

Michael G. Rosenfield

IBM J. Res. Dev., 2003

Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches.

[BibT_eX]

[DOI]

Sandhya Dwarkadas

Alper Buyuktosunoglu

Proceedings of the Power-Aware Computer Systems, Third International Workshop, 2003

2002

Early-Stage Definition of LPX: A Low Power Issue-Execute Processor.

[BibT_eX]

[DOI]

Proceedings of the Power-Aware Computer Systems, Second International Workshop, 2002

Optimizing pipelines for power and performance.

[BibT_eX]

[DOI]

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

2001

Hardware solutions to reduce effective memory access time.

[BibT_eX]

[DOI]

PhD thesis, 2001

Branch History Guided Instruction Prefetching.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

1999

Active Management of Data Caches by Exploiting Reuse Information.

[BibT_eX]

[DOI]

Edward S. Tam

Gary S. Tyson

Edward S. Davidson

IEEE Trans. Computers, 1999

1998

Evaluating the performance of active cache management schemes.

[BibT_eX]

[DOI]

Edward S. Tam

Gary S. Tyson

Edward S. Davidson

Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

1997

Towards a Communication Characterization Methodology for Parallel Applications.

[BibT_eX]

[DOI]

Sucheta Chodnekar