Karthikeyan Sankaralingam

Orcid: 0000-0002-8315-2389

  • University of Wisconsin, USA

According to our database1, Karthikeyan Sankaralingam authored at least 89 papers between 2001 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


A Whimsical Odyssey Through the Maze of Scholarly Reviews.
Commun. ACM, November, 2024

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

A Journey of a 1, 000 Kernels Begins with a Single Step: A Retrospective of Deep Learning on GPUs.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

BRYT: Data Rich Analytics Based Computer Architecture for A New Paradigm of Chip Design to Supplant Moore's Law.
CoRR, 2023

LookupFFN: Making Transformers Compute-lite for CPU inference.
Proceedings of the International Conference on Machine Learning, 2023

The Mozart reuse exposed dataflow processor for AI and beyond: industrial product.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Understanding the Limits of Conventional Hardware Architectures for Deep-Learning.
CoRR, 2021

Mozart: Designing for Software Maturity and the Next Paradigm for Chip Architectures.
Proceedings of the IEEE Hot Chips 33 Symposium, 2021

Applying Transactional Memory for Concurrency-Bug Failure Recovery in Production Runs.
IEEE Trans. Parallel Distributed Syst., 2019

A Static Analysis-based Cross-Architecture Performance Prediction Using Machine Learning.
CoRR, 2019

Heterogeneous Von Neumann/dataflow microprocessors.
Commun. ACM, 2019

MPU-BWM: Accelerating Sequence Alignment.
IEEE Comput. Archit. Lett., 2018

Applying Hardware Transactional Memory for Concurrency-Bug Failure Recovery in Production Runs.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

Domain Specialization Is Generally Unnecessary for Accelerators.
IEEE Micro, 2017

Democratizing Design for Future Computing Platforms.
CoRR, 2017

Kickstarting Semiconductor Innovation with Open Source Hardware.
Computer, 2017

Stream-Dataflow Acceleration.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

A Heterogeneous Von Neumann/Explicit Dataflow Processor.
IEEE Micro, 2016

Accelerating the Accelerator Memory Interface with Access-Execute and Dataflow.
IEEE Micro, 2016

Near-Memory Data Services.
IEEE Micro, 2016

Open-source Hardware: Opportunities and Challenges.
CoRR, 2016

Pushing the limits of accelerator efficiency while retaining programmability.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Modularizing the microprocessor core to outperform traditional out-of-order.
Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

Analyzing Behavior Specialized Acceleration.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

ISA Wars: Understanding the Relevance of ISA being RISC or CISC to Performance, Power, and Energy on Modern Architectures.
ACM Trans. Comput. Syst., 2015

Enabling GPGPU Low-Level Hardware Explorations with MIAOW: An Open-Source RTL Implementation of a GPGPU.
ACM Trans. Archit. Code Optim., 2015

Comprehensive Circuit Failure Prediction for Logic and SRAM Using Virtual Aging.
IEEE Micro, 2015

Architectural Simulators Considered Harmful.
IEEE Micro, 2015

Fixing, preventing, and recovering from concurrency bugs.
Sci. China Inf. Sci., 2015

A Graph-Based Program Representation for Analyzing Hardware Specialization Approaches.
IEEE Comput. Archit. Lett., 2015

Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Exploring the potential of heterogeneous von neumann/dataflow execution models.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Efficient execution of memory access phases using dataflow specialization.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

MIAOW: An open source GPGPU.
Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

MIAOW - An open source RTL implementation of a GPGPU.
Proceedings of the 2015 IEEE Symposium in Low-Power and High-Speed Chips, 2015

A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories.
ACM Trans. Program. Lang. Syst., 2014

Hands-on introduction to computer science at the freshman level.
Proceedings of the 45th ACM Technical Symposium on Computer Science Education, 2014

Understanding the impact of gate-level physical reliability effects on whole program execution.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Memory processing units.
Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014

Optimization and Mathematical Modeling in Computer Architecture
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01773-5, 2013

Constraint centric scheduling guide.
SIGARCH Comput. Archit. News, 2013

Multicore Model from Abstract Single Core Inputs.
IEEE Comput. Archit. Lett., 2013

Power challenges may end the multicore era.
Commun. ACM, 2013

A general constraint-centric scheduling framework for spatial architectures.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

Virtually-aged sampling DMR: unifying circuit failure prediction and circuit failure detection.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Idempotent code generation: Implementation, analysis, and evaluation.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Dynamic hardware specialization-using moore's bounty without burning the chip down.
Proceedings of the International Conference on Compilers, 2013

ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

SWSL: SoftWare Synthesis for network Lookup.
Proceedings of the Symposium on Architecture for Networking and Communications Systems, 2013

Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Power Limitations and Dark Silicon Challenge the Future of Multicore.
ACM Trans. Comput. Syst., 2012

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing.
IEEE Micro, 2012

Dark Silicon and the End of Multicore Scaling.
IEEE Micro, 2012

Static analysis and compiler design for idempotent processing.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012

iGPU: Exception support and speculative execution on GPUs.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Mechanisms and Evaluation of Cross-Layer Fault-Tolerance for Supercomputing.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Design, integration and implementation of the DySER hardware accelerator into OpenSPARC.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Prototyping the DySER specialization architecture with OpenSPARC.
Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), 2012

LEAP: latency- energy- and area-optimized lookup pipeline.
Proceedings of the Symposium on Architecture for Networking and Communications Systems, 2012

Exploring the Interaction Between Device Lifetime Reliability and Security Vulnerabilities.
IEEE Comput. Archit. Lett., 2011

Idempotent processor architecture.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Sampling + DMR: practical and low-overhead permanent fault detection.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Dynamically Specialized Datapaths for energy efficient computing.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Experiences in Co-designing a Packet Classification Algorithm and a Flexible Hardware Platform.
Proceedings of the 2011 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2011

A fast and highly accurate path delay emulation framework for logic-emulation of timing speculation.
Proceedings of the 2011 IEEE International Test Conference, 2010

Relax: an architectural framework for software recovery of hardware faults.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

A unified model for timing speculation: Evaluating the impact of technology scaling, CMOS design style, and fault recovery mechanism.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Design and implementation of the PLUG architecture for programmable and efficient network lookups.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

MapReduce for the Cell Broadband Engine Architecture.
IBM J. Res. Dev., 2009

PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers.
Proceedings of the ACM SIGCOMM 2009 Conference on Applications, 2009

Evaluating GPUs for network packet signature matching.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Toward a multicore architecture for real-time ray-tracing.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

On-Chip Interconnection Networks of the TRIPS Chip.
IEEE Micro, 2007

Implementation and Evaluation of a Dynamically Routed Processor Operand Network.
Proceedings of the First International Symposium on Networks-on-Chips, 2007

Implementing Signatures for Transactional Memory.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Dataflow Predication.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP.
ACM Trans. Archit. Code Optim., 2004

General parallel computations on desktop grid and P2P systems.
Proceedings of the 7th Workshop on languages, 2004

Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture.
IEEE Micro, 2003

Pagerank Computation and Keyword Search on Distributed Systems and P2P Networks.
J. Grid Comput., 2003

Universal Mechanisms for Data-Parallel Architectures.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Routed Inter-ALU Networks for ILP Scalability and Performance.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Distributed Pagerank for P2P Systems.
Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

A design space evaluation of grid processor architectures.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001
