Karthikeyan Sankaralingam

Vikas Singh

Proceedings of the Forty-first International Conference on Machine Learning, 2024

WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization.

[BibT_eX]

[DOI]

Neal Clayton Crago

Sana Damani

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

A Journey of a 1, 000 Kernels Begins with a Single Step: A Retrospective of Deep Learning on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

BRYT: Data Rich Analytics Based Computer Architecture for A New Paradigm of Chip Design to Supplant Moore's Law.

[BibT_eX]

[DOI]

CoRR, 2023

LookupFFN: Making Transformers Compute-lite for CPU inference.

[BibT_eX]

[DOI]

Zhanpeng Zeng

Michael Davies

Pranav Pulijala

Vikas Singh

Proceedings of the International Conference on Machine Learning, 2023

2022

The Mozart reuse exposed dataflow processor for AI and beyond: industrial product.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

2021

Understanding the Limits of Conventional Hardware Architectures for Deep-Learning.

[BibT_eX]

[DOI]

Michael Davies

Adam Labiosa

CoRR, 2021

Mozart: Designing for Software Maturity and the Next Paradigm for Chip Architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE Hot Chips 33 Symposium, 2021

2019

Applying Transactional Memory for Concurrency-Bug Failure Recovery in Production Runs.

[BibT_eX]

[DOI]

Yuxi Chen

Shu Wang

Shan Lu

IEEE Trans. Parallel Distributed Syst., 2019

A Static Analysis-based Cross-Architecture Performance Prediction Using Machine Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Heterogeneous Von Neumann/dataflow microprocessors.

[BibT_eX]

[DOI]

Thiruvengadam Vijayaraghavan

Commun. ACM, 2019

2018

MPU-BWM: Accelerating Sequence Alignment.

[BibT_eX]

[DOI]

Amit Rajesh

IEEE Comput. Archit. Lett., 2018

Applying Hardware Transactional Memory for Concurrency-Bug Failure Recovery in Production Runs.

[BibT_eX]

[DOI]

Yuxi Chen

Shu Wang

Shan Lu

Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign.

[BibT_eX]

[DOI]

Newsha Ardalani

Jian Weng

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Domain Specialization Is Generally Unnecessary for Accelerators.

[BibT_eX]

[DOI]

Greg Wright

IEEE Micro, 2017

Democratizing Design for Future Computing Platforms.

[BibT_eX]

[DOI]

Luis Ceze

Mark D. Hill

Thomas F. Wenisch

CoRR, 2017

Kickstarting Semiconductor Innovation with Open Source Hardware.

[BibT_eX]

[DOI]

Gagan Gupta

Computer, 2017

Stream-Dataflow Acceleration.

[BibT_eX]

[DOI]

Newsha Ardalani

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016

A Heterogeneous Von Neumann/Explicit Dataflow Processor.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Accelerating the Accelerator Memory Interface with Access-Execute and Dataflow.

[BibT_eX]

[DOI]

Sung Jin Kim

IEEE Micro, 2016

Near-Memory Data Services.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Open-source Hardware: Opportunities and Challenges.

[BibT_eX]

[DOI]

Gagan Gupta

CoRR, 2016

Pushing the limits of accelerator efficiency while retaining programmability.

[BibT_eX]

[DOI]

Greg Wright

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Modularizing the microprocessor core to outperform traditional out-of-order.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

Analyzing Behavior Specialized Acceleration.

[BibT_eX]

[DOI]

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

ISA Wars: Understanding the Relevance of ISA being RISC or CISC to Performance, Power, and Energy on Modern Architectures.

[BibT_eX]

[DOI]

Thiruvengadam Vijayaraghavan

ACM Trans. Comput. Syst., 2015

Enabling GPGPU Low-Level Hardware Explorations with MIAOW: An Open-Source RTL Implementation of a GPGPU.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2015

Comprehensive Circuit Failure Prediction for Logic and SRAM Using Virtual Aging.

[BibT_eX]

[DOI]

Amir Yazdanbakhsh

IEEE Micro, 2015

Architectural Simulators Considered Harmful.

[BibT_eX]

[DOI]

IEEE Micro, 2015

Fixing, preventing, and recovering from concurrency bugs.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2015

A Graph-Based Program Representation for Analyzing Hardware Specialization Approaches.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance.

[BibT_eX]

[DOI]

Newsha Ardalani

Clint Lestourgeon

Xiaojin Zhu

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Exploring the potential of heterogeneous von neumann/dataflow execution models.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Efficient execution of memory access phases using dataflow specialization.

[BibT_eX]

[DOI]

Sung Jin Kim

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

MIAOW: An open source GPGPU.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

MIAOW - An open source RTL implementation of a GPGPU.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Symposium in Low-Power and High-Speed Chips, 2015

2014

A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories.

[BibT_eX]

[DOI]

Michael Sartin-Tarm

Behnam Robatmili

ACM Trans. Program. Lang. Syst., 2014

Hands-on introduction to computer science at the freshman level.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM Technical Symposium on Computer Science Education, 2014

Understanding the impact of gate-level physical reliability effects on whole program execution.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Memory processing units.

[BibT_eX]

[DOI]

Vijayraghavan Thiruvengadam

Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014

2013

Optimization and Mathematical Modeling in Computer Architecture

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01773-5, 2013

Constraint centric scheduling guide.

[BibT_eX]

[DOI]

Michael Sartin-Tarm

SIGARCH Comput. Archit. News, 2013

Multicore Model from Abstract Single Core Inputs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2013

Power challenges may end the multicore era.

[BibT_eX]

[DOI]

Commun. ACM, 2013

A general constraint-centric scheduling framework for spatial architectures.

[BibT_eX]

[DOI]

Michael Sartin-Tarm

Behnam Robatmili

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

Virtually-aged sampling DMR: unifying circuit failure prediction and circuit failure detection.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Idempotent code generation: Implementation, analysis, and evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Dynamic hardware specialization-using moore's bounty without burning the chip down.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Compilers, 2013

ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution.

[BibT_eX]

[DOI]

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

SWSL: SoftWare Synthesis for network Lookup.

[BibT_eX]

[DOI]

Sung Jin Kim

Proceedings of the Symposium on Architecture for Networking and Communications Systems, 2013

Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Power Limitations and Dark Silicon Challenge the Future of Multicore.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2012

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing.

[BibT_eX]

[DOI]

Changkyu Kim

IEEE Micro, 2012

Dark Silicon and the End of Multicore Scaling.

[BibT_eX]

[DOI]

IEEE Micro, 2012

Static analysis and compiler design for idempotent processing.

[BibT_eX]

[DOI]

Somesh Jha

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012

iGPU: Exception support and speculative execution on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Mechanisms and Evaluation of Cross-Layer Fault-Tolerance for Supercomputing.

[BibT_eX]

[DOI]

Barry Rountree

Martin Schulz

Bronis R. de Supinski

Proceedings of the 41st International Conference on Parallel Processing, 2012

Design, integration and implementation of the DySER hardware accelerator into OpenSPARC.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Prototyping the DySER specialization architecture with OpenSPARC.

[BibT_eX]

[DOI]

Jesse Benson

Ryan Cofell

Chris Frericks

Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), 2012

LEAP: latency- energy- and area-optimized lookup pipeline.

[BibT_eX]

[DOI]

Eric N. Harris

Samuel L. Wasmundt

Proceedings of the Symposium on Architecture for Networking and Communications Systems, 2012

2011

Exploring the Interaction Between Device Lifetime Reliability and Security Vulnerabilities.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2011

Idempotent processor architecture.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Sampling + DMR: practical and low-overhead permanent fault detection.

[BibT_eX]

[DOI]

Matthew D. Sinclair

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Dynamically Specialized Datapaths for energy efficient computing.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Experiences in Co-designing a Packet Classification Algorithm and a Flexible Hardware Platform.

[BibT_eX]

[DOI]

Nilay Vaish

Thawan Kooburat

Proceedings of the 2011 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2011

2010

A fast and highly accurate path delay emulation framework for logic-emulation of timing speculation.

[BibT_eX]

[DOI]

Ranganathan Sankaralingam

Proceedings of the 2011 IEEE International Test Conference, 2010

Relax: an architectural framework for software recovery of hardware faults.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

A unified model for timing speculation: Evaluating the impact of technology scaling, CMOS design style, and fault recovery mechanism.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Design and implementation of the PLUG architecture for programmable and efficient network lookups.

[BibT_eX]

[DOI]

Somesh Jha

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

MapReduce for the Cell Broadband Engine Architecture.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2009

PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGCOMM 2009 Conference on Applications, 2009

Evaluating GPUs for network packet signature matching.

[BibT_eX]

[DOI]

Randy Smith

Neelam Goyal

Justin Ormont

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

2008

Toward a multicore architecture for real-time ray-tracing.

[BibT_eX]

[DOI]

Peter Djeu

Mary K. Vernon

William R. Mark

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

2007

On-Chip Interconnection Networks of the TRIPS Chip.

[BibT_eX]

[DOI]

Paul Gratz

Changkyu Kim

Heather Hanson

Premkishore Shivakumar

IEEE Micro, 2007

Implementation and Evaluation of a Dynamically Routed Processor Operand Network.

[BibT_eX]

[DOI]

Paul Gratz

Heather Hanson

Premkishore Shivakumar

Robert G. McDonald

Proceedings of the First International Symposium on Networks-on-Chips, 2007

Implementing Signatures for Transactional Memory.

[BibT_eX]

[DOI]

Daniel Sánchez

Luke Yen

Mark D. Hill

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

2006

Dataflow Predication.

[BibT_eX]

[DOI]

Aaron Smith

Ramadass Nagarajan

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor.

[BibT_eX]

[DOI]

Premkishore Shivakumar

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

2004

TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

General parallel computations on desktop grid and P2P systems.

[BibT_eX]

[DOI]

James C. Browne

Madhulika Yalamanchi

Kevin Kane

Proceedings of the 7th Workshop on languages, 2004

2003

Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture.

[BibT_eX]

[DOI]

IEEE Micro, 2003

Pagerank Computation and Keyword Search on Distributed Systems and P2P Networks.

[BibT_eX]

[DOI]

Madhulika Yalamanchi

Simha Sethumadhavan

James C. Browne

J. Grid Comput., 2003

Universal Mechanisms for Data-Parallel Architectures.

[BibT_eX]

[DOI]

William R. Mark

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Routed Inter-ALU Networks for ILP Scalability and Performance.

[BibT_eX]

[DOI]

Vincent Ajay Singh

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Distributed Pagerank for P2P Systems.

[BibT_eX]

[DOI]

Simha Sethumadhavan

James C. Browne

Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

2001

A design space evaluation of grid processor architectures.

[BibT_eX]

[DOI]

Ramadass Nagarajan