Asit K. Mishra

Mahmut T. Kandemir

CoRR, 2021

Accelerating Sparse Deep Neural Networks.

[BibT_eX]

[DOI]

Jorge Albericio Latorre

CoRR, 2021

2019

Opportunistic computing in GPU architectures.

[BibT_eX]

[DOI]

Anand Sivasubramaniam

Proceedings of the 46th International Symposium on Computer Architecture, 2019

2018

Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs.

[BibT_eX]

[DOI]

CoRR, 2018

WRPN & Apprentice: Methods for Training and Inference using Low-Precision Numerics.

[BibT_eX]

[DOI]

CoRR, 2018

WRPN: Wide Reduced-Precision Networks.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC(Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study.

[BibT_eX]

[DOI]

Philip Heng Wai Leong

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs: (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017

Low Precision RNNs: Quantizing RNNs Without Losing Accuracy.

[BibT_eX]

[DOI]

Supriya Kapur

CoRR, 2017

WRPN: Training and Inference using Wide Reduced-Precision Networks.

[BibT_eX]

[DOI]

CoRR, 2017

High performance binary neural networks on the Xeon+FPGA™ platform.

[BibT_eX]

[DOI]

Philip Heng Wai Leong

Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Fine-grained accelerators for sparse machine learning workloads.

[BibT_eX]

[DOI]

Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016

ScalCore: Designing a core for voltage scalability.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016

Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Hardware accelerator for analytics of sparse data.

[BibT_eX]

[DOI]

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

A sparse matrix vector multiply accelerator for support vector machine.

[BibT_eX]

[DOI]

Eriko Nurvitadhi

Proceedings of the 2015 International Conference on Compilers, 2015

2014

Tangle: Route-oriented dynamic voltage minimization for variation-afflicted, energy-efficient on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013

Orchestrated scheduling and prefetching for GPGPUs.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Runnemede: An architecture for Ubiquitous High-Performance Computing.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

A heterogeneous multiple network-on-chip design: an application-aware approach.

[BibT_eX]

[DOI]

Onur Mutlu

Nachiappan Chidambaram Nachiappan

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.

[BibT_eX]

[DOI]

Adwait Jog

Onur Kayiran

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012

Cache revive: architecting volatile STT-RAM caches for enhanced performance in CMPs.

[BibT_eX]

[DOI]

Ravishankar R. Iyer

Nachiappan Chidambaram Nachiappan

Proceedings of the 49th Annual Design Automation Conference 2012, 2012

PEPON: performance-aware hierarchical power budgeting for NoC based multicores.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Application-aware prefetch prioritization in on-chip networks.

[BibT_eX]

[DOI]

Mahmut T. Kandemir

Anand Sivasubramaniam

Onur Mutlu

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

RAFT: A router architecture with frequency tuning for on-chip networks.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2011

Exploiting Heterogeneity for Energy Efficiency in Chip Multiprocessors.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2011

METE: meeting end-to-end QoS in multicores through system-wide resource management.

[BibT_eX]

[DOI]

Proceedings of the SIGMETRICS 2011, 2011

A case for heterogeneous on-chip interconnects for CMPs.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

ACCESS: Smart scheduling for asymmetric cache CMPs.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

An energy-efficient heterogeneous CMP based on hybrid TFET-CMOS cores.

[BibT_eX]

[DOI]

Vinay Saripalli

Suman Datta

Proceedings of the 48th Design Automation Conference, 2011

2010

Towards characterizing cloud backend workloads: insights from Google compute clusters.

[BibT_eX]

[DOI]

Joseph L. Hellerstein

Walfredo Cirne

SIGMETRICS Perform. Evaluation Rev., 2010

Coordinated power management of voltage islands in CMPs.

[BibT_eX]

[DOI]

Proceedings of the SIGMETRICS 2010, 2010

CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

2009

A case for integrated processor-cache partitioning in chip multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

A case for dynamic frequency tuning in on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs.

[BibT_eX]

[DOI]

Reetuparna Das

Soumya Eachempati

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

2008

MIRA: A Multi-layered On-Chip Interconnect Router Architecture.

[BibT_eX]

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Detection of Arcing in Low Voltage Distribution Systems.

[BibT_eX]

[DOI]

Aurobinda Routray

Ashok Kumar Pradhan

Proceedings of the IEEE Reglon 10 Colloquium and Third International Conference on Industrial and Information Systems, 2008

Performance and power optimization through data compression in Network-on-Chip architectures.

[BibT_eX]

[DOI]

Reetuparna Das

Chrysostomos Nicopoulos

Dongkook Park

Ravishankar R. Iyer

Mazin S. Yousif