Kevin Skadron
Orcid: 0000-0002-8091-9302Affiliations:
- University of Virginia, Charlottesville, USA
According to our database1,
Kevin Skadron
authored at least 212 papers
between 1997 and 2024.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2015, "For contributions in power- and thermal-aware modeling, design and benchmarking of microprocessors, including GPU.".
IEEE Fellow
IEEE Fellow 2013, "For contributions to thermal modeling in microprocessors".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on twitter.com
-
on orcid.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
ACM Trans. Reconfigurable Technol. Syst., September, 2024
GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis.
Int. J. Parallel Program., June, 2024
ACM Trans. Archit. Code Optim., March, 2024
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
2022
Supporting Moderate Data Dependency, Position Dependency, and Divergence in PIM-Based Accelerators.
IEEE Micro, 2022
IEEE Micro, 2022
Integr., 2022
CoRR, 2022
IEEE Comput. Archit. Lett., 2022
IEEE Comput. Archit. Lett., 2022
Speculative Code Compaction: Eliminating Dead Code via Speculative Microcode Transformations.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Gearbox: a case for supporting accumulation dispatching and hybrid partitioning in PIM-based accelerators.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
2021
ROCKY: A Robust Hybrid On-Chip Memory Kit for the Processors With STT-MRAM Cache Technology.
IEEE Trans. Computers, 2021
NOSTalgy: Near-Optimum Run-Time STT-MRAM Quality-Energy Knob Management for Approximate Computing Applications.
IEEE Trans. Computers, 2021
CoRR, 2021
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
Sieve: Scalable In-situ DRAM-based Accelerator Designs for Massively Parallel k-mer Matching.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021
2020
Towards on-node Machine Learning for Ultra-low-power Sensors Using Asynchronous Σ Δ Streams.
ACM J. Emerg. Technol. Comput. Syst., 2020
IEEE Comput. Archit. Lett., 2020
Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
Proceedings of the 2020 Formal Methods in Computer Aided Design, 2020
Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020
FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing Accelerators.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
2019
IEEE Trans. Very Large Scale Integr. Syst., 2019
Automata Processing in Reconfigurable Architectures: In-the-Cloud Deployment, Cross-Platform Evaluation, and Fast Symbol-Only Reconfiguration.
ACM Trans. Reconfigurable Technol. Syst., 2019
J. Parallel Distributed Comput., 2019
A Scalable and Efficient In-Memory Interconnect Architecture for Automata Processing.
IEEE Comput. Archit. Lett., 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Proceedings of the International Symposium on Memory Systems, 2019
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
2018
Tolerating Soft Errors in Processor Cores Using CLEAR (Cross-Layer Exploration for Architecting Resilience).
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
Int. J. Parallel Program., 2018
MNCaRT: An Open-Source, Multi-Architecture Automata-Processing Research and Execution Ecosystem.
IEEE Comput. Archit. Lett., 2018
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware Accelerators.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
Characterizing and Mitigating Output Reporting Bottlenecks in Spatial Automata Processing Architectures.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Searching for Potential gRNA Off-Target Sites for CRISPR/Cas9 Using Automata Processing Across Different Platforms.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
2017
Dual-Data Rate Transpose-Memory Architecture Improves the Performance, Power and Area of Signal-Processing Systems.
J. Signal Process. Syst., 2017
Accelerating Weeder: A DNA Motif Search Tool Using the Micron Automata Processor and FPGA.
IEICE Trans. Inf. Syst., 2017
Proceedings of the International Conference on Supercomputing, 2017
Pre-RTL Voltage and Power Optimization for Low-Cost, Thermally Challenged Multicore Chips.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017
Classifying images in a histopathological dataset using the cumulative distribution transform on an automata architecture.
Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing, 2017
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017
Automata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017
Proceedings of the 28th IEEE International Conference on Application-specific Systems, 2017
PPE-ARX: Area- and power-efficient VLIW programmable processing element for IoT crypto-systems.
Proceedings of the 2017 NASA/ESA Conference on Adaptive Hardware and Systems, 2017
2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines and architectures.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016
Lumos+: Rapid, pre-RTL design space exploration on accelerator-rich heterogeneous architectures with reconfigurable logic.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016
Proceedings of the 34th IEEE International Conference on Computer Design, 2016
Clear: cross-layer exploration for architecting resilience combining hardware and software techniques to tolerate soft errors in processor cores.
Proceedings of the 53rd Annual Design Automation Conference, 2016
Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2016
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016
Proceedings of the 50th Asilomar Conference on Signals, Systems and Computers, 2016
2015
Proceedings of the 9th IEEE International Conference on Semantic Computing, 2015
Transient voltage noise in charge-recycled power delivery networks for many-layer 3D-IC.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015
Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015
A cross-layer design exploration of charge-recycled power-delivery in many-layer 3d-IC.
Proceedings of the 52nd Annual Design Automation Conference, 2015
Regular expression acceleration on the micron automata processor: Brill tagging as a case study.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015
2014
Int. J. High Perform. Comput. Appl., 2014
Proceedings of the Technical Papers of 2014 International Symposium on VLSI Design, 2014
SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Dymaxion++: A Directive-Based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Characterization of transient error tolerance for a class of mobile embedded applications.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014
Proceedings of the 51st Annual Design Automation Conference 2014, 2014
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014
Proceedings of the 48th Asilomar Conference on Signals, Systems and Computers, 2014
2013
IEEE Micro, 2013
J. Parallel Distributed Comput., 2013
IEEE Comput. Archit. Lett., 2013
Bioinform., 2013
Proceedings of the IEEE International Symposium on Workload Characterization, 2013
Load balancing in a changing world: dealing with heterogeneity and performance variability.
Proceedings of the Computing Frontiers Conference, 2013
2012
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors.
ACM Trans. Comput. Syst., 2012
IEEE Micro, 2012
Proceedings of the 20th IEEE/IFIP International Conference on VLSI and System-on-Chip, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Proceedings of the 4th USENIX Workshop on Hot Topics in Parallelism, 2012
Fundamentals of Multicore Software Development, 2012
2011
Sustain. Comput. Informatics Syst., 2011
IEEE Micro, 2011
A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations.
Int. J. Parallel Program., 2011
Proceedings of the Conference on High Performance Computing Networking, 2011
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011
Reducing the cost of redundant execution in safety-critical systems using relaxed dedication.
Proceedings of the Design, Automation and Test in Europe, 2011
Proceedings of the 14th International Conference on Compilers, 2011
2010
Federation: Boosting per-thread performance of throughput-oriented manycore architectures.
ACM Trans. Archit. Code Optim., 2010
The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches.
Proceedings of the Conference on High Performance Computing Networking, 2010
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
Proceedings of the Computer Architecture, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010
Proceedings of the 28th International Conference on Computer Design, 2010
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010
2009
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Differentiating the roles of IR measurement and simulation for power and temperature-aware design.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009
Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs.
Proceedings of the 23rd international conference on Supercomputing, 2009
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling.
Proceedings of the 27th International Conference on Computer Design, 2009
2008
Accurate, Pre-RTL Temperature-Aware Design Using a Parameterized, Geometric Thermal Model.
IEEE Trans. Computers, 2008
IEEE Trans. Computers, 2008
A performance study of general-purpose applications on graphics processors using CUDA.
J. Parallel Distributed Comput., 2008
Proceedings of the IEEE Symposium on Application Specific Processors, 2008
Proceedings of the 45th Design Automation Conference, 2008
Proceedings of the 45th Design Automation Conference, 2008
Proceedings of the 45th Design Automation Conference, 2008
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008
2007
IEEE Trans. Very Large Scale Integr. Syst., 2007
IEEE Trans. Computers, 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors.
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware 2007, 2007
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007
2006
IEEE Trans. Very Large Scale Integr. Syst., 2006
Proceedings of the Parallel and Distributed Processing and Applications, 2006
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006
The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware.
Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2006
Proceedings of the Conference on Design, Automation and Test in Europe, 2006
Proceedings of the Reconfigurable Computing: Architectures and Applications, 2006
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006
2005
ACM Trans. Archit. Code Optim., 2005
ACM Trans. Archit. Code Optim., 2005
J. Instr. Level Parallelism, 2005
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Power and thermal effects of SRAM vs. Latch-Mux design styles and clock gating choices.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005
The need for a full-chip and package thermal model for thermally optimized IC designs.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005
Using Performance Counters for Runtime Temperature Sensing in High-Performance Processors.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005
Proceedings of the 42nd Design Automation Conference, 2005
2004
IEEE Trans. Computers, 2004
ACM Trans. Archit. Code Optim., 2004
ACM Trans. Archit. Code Optim., 2004
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2004
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004
Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware 2004, 2004
Proceedings of the 2004 Design, 2004
Proceedings of the 2004 Design, 2004
Proceedings of the 41th Design Automation Conference, 2004
2003
Microelectron. J., 2003
Alloyed Branch History: Combining Global and Local Branch History for Robust Performance.
Int. J. Parallel Program., 2003
Proceedings of the 24th IEEE Real-Time Systems Symposium (RTSS 2003), 2003
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003
2002
IEEE Comput. Archit. Lett., 2002
Proceedings of the 2002 workshop on Computer architecture education, 2002
Proceedings of the 33rd SIGCSE Technical Symposium on Computer Science Education, 2002
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002
Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002
Proceedings of the International Conference on Compilers, 2002
2001
Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software, 2001
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001
2000
J. Instr. Level Parallelism, 2000
A microprocessor survey course: exploring advanced computer architecture in practice.
Proceedings of the 2000 workshop on Computer architecture education, 2000
A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions.
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000
1999
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques.
IEEE Trans. Computers, 1999
1998
Improving Prediction for Procedure Returns with Return-address-stack Repair Mechanisms.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998
Proceedings of the 12th international conference on Supercomputing, 1998
1997
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997