Kevin Skadron

Orcid: 0000-0002-8091-9302

Affiliations:
  • University of Virginia, Charlottesville, USA


According to our database1, Kevin Skadron authored at least 212 papers between 1997 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2015, "For contributions in power- and thermal-aware modeling, design and benchmarking of microprocessors, including GPU.".

IEEE Fellow

IEEE Fellow 2013, "For contributions to thermal modeling in microprocessors".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Dynamic-ACTS - A Dynamic Graph Analytics Accelerator For HBM-Enabled FPGAs.
ACM Trans. Reconfigurable Technol. Syst., September, 2024

GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis.
Int. J. Parallel Program., June, 2024

Abakus: Accelerating <i>k</i>-mer Counting with Storage Technology.
ACM Trans. Archit. Code Optim., March, 2024

ECG: Expressing Locality and Prefetching for Optimal Caching in Graph Structures.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023
HashMem: PIM-based Hashmap Accelerator.
CoRR, 2023

FreezeTime: Towards System Emulation through Architectural Virtualization.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

FreezeTime: Towards System Emulation through Architectural Virtualization.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

ACTS: A Near-Memory FPGA Graph Processing Framework.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

Hardware Trojans in eNVM Neuromorphic Devices.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

2022
Supporting Moderate Data Dependency, Position Dependency, and Divergence in PIM-Based Accelerators.
IEEE Micro, 2022

Synthesizing Legacy String Code for FPGAs Using Bounded Automata Learning.
IEEE Micro, 2022

Agile-AES: Implementation of configurable AES primitive with agile design approach.
Integr., 2022

Deterministic vs. Non Deterministic Finite Automata in Automata Processing.
CoRR, 2022

DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching.
IEEE Comput. Archit. Lett., 2022

Pulley: An Algorithm/Hardware Co-Optimization for In-Memory Sorting.
IEEE Comput. Archit. Lett., 2022

Speculative Code Compaction: Eliminating Dead Code via Speculative Microcode Transformations.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Gearbox: a case for supporting accumulation dispatching and hybrid partitioning in PIM-based accelerators.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

PiMulator: a Fast and Flexible Processing-in-Memory Emulation Platform.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2021
ROCKY: A Robust Hybrid On-Chip Memory Kit for the Processors With STT-MRAM Cache Technology.
IEEE Trans. Computers, 2021

NOSTalgy: Near-Optimum Run-Time STT-MRAM Quality-Energy Knob Management for Approximate Computing Applications.
IEEE Trans. Computers, 2021

A Roadmap for Enabling a Future-Proof In-Network Computing Data Plane Ecosystem.
CoRR, 2021

Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching Acceleration.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Sieve: Scalable In-situ DRAM-based Accelerator Designs for Massively Parallel k-mer Matching.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

BigMap: Future-proofing Fuzzers with Efficient Large Maps.
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020
Towards on-node Machine Learning for Ultra-low-power Sensors Using Asynchronous Σ Δ Streams.
ACM J. Emerg. Technol. Comput. Syst., 2020

Enabling In-SRAM Pattern Processing With Low-Overhead Reporting Architecture.
IEEE Comput. Archit. Lett., 2020

Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Runtime Verification on FPGAs with LTLf Specifications.
Proceedings of the 2020 Formal Methods in Computer Aided Design, 2020

Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing Accelerators.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
MTTF Enhancement Power-C4 Bump Placement Optimization.
IEEE Trans. Very Large Scale Integr. Syst., 2019

Automata Processing in Reconfigurable Architectures: In-the-Cloud Deployment, Cross-Platform Evaluation, and Fast Symbol-Only Reconfiguration.
ACM Trans. Reconfigurable Technol. Syst., 2019

Portable Programming with RAPID.
IEEE Trans. Parallel Distributed Syst., 2019

Reco-Pi: A reconfigurable Cryptoprocessor for π-Cipher.
J. Parallel Distributed Comput., 2019

A Scalable and Efficient In-Memory Interconnect Architecture for Automata Processing.
IEEE Comput. Archit. Lett., 2019

eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Hopscotch: a micro-benchmark suite for memory performance evaluation.
Proceedings of the International Symposium on Memory Systems, 2019

GraphTinker: A High Performance Data Structure for Dynamic Graph Processing.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Cross-Layer Resilience: Challenges, Insights, and the Road Ahead.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Debugging Support for Pattern-Matching Languages and Accelerators.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Tolerating Soft Errors in Processor Cores Using CLEAR (Cross-Layer Exploration for Architecting Resilience).
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Hierarchical Pattern Mining with the Automata Processor.
Int. J. Parallel Program., 2018

MNCaRT: An Open-Source, Multi-Architecture Automata-Processing Research and Execution Ecosystem.
IEEE Comput. Archit. Lett., 2018

ASPEN: A Scalable In-SRAM Architecture for Pushdown Automata.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware Accelerators.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

AutomataZoo: A Modern Automata Processing Benchmark Suite.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Characterizing and Mitigating Output Reporting Bottlenecks in Spatial Automata Processing Architectures.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Searching for Potential gRNA Off-Target Sites for CRISPR/Cas9 Using Automata Processing Across Different Platforms.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
Dual-Data Rate Transpose-Memory Architecture Improves the Performance, Power and Area of Signal-Processing Systems.
J. Signal Process. Syst., 2017

Accelerating Weeder: A DNA Motif Search Tool Using the Micron Automata Processor and FPGA.
IEICE Trans. Inf. Syst., 2017

Frequent subtree mining on the automata processor: challenges and opportunities.
Proceedings of the International Conference on Supercomputing, 2017

Pre-RTL Voltage and Power Optimization for Low-Cost, Thermally Challenged Multicore Chips.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Cross-Layer Resilience in Low-Voltage Digital Systems: Key Insights.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017


Classifying images in a histopathological dataset using the cumulative distribution transform on an automata architecture.
Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing, 2017

REAPR: Reconfigurable engine for automata processing.
Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Automata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Acceleration of Frequent Itemset Mining on FPGA using SDAccel and Vivado HLS.
Proceedings of the 28th IEEE International Conference on Application-specific Systems, 2017

PPE-ARX: Area- and power-efficient VLIW programmable processing element for IoT crypto-systems.
Proceedings of the 2017 NASA/ESA Conference on Adaptive Hardware and Systems, 2017

2016
Tolerating the Consequences of Multiple EM-Induced C4 Bump Failures.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Near-Memory Data Services.
IEEE Micro, 2016

A 16-Bit Reconfigurable Encryption Processor for p-Cipher.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines and architectures.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Lumos+: Rapid, pre-RTL design space exploration on accelerator-rich heterogeneous architectures with reconfigurable logic.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Generating efficient and high-quality pseudo-random behavior on Automata Processors.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Clear: cross-layer exploration for architecting resilience combining hardware and software techniques to tolerate soft errors in processor cores.
Proceedings of the 53rd Annual Design Automation Conference, 2016

An overview of micron's automata processor.
Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2016

Sequential pattern mining with the Micron automata processor.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Entity resolution acceleration using the automata processor.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

RAPID Programming of Pattern-Recognition Processors.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

Feature extraction and image retrieval on an automata structure.
Proceedings of the 50th Asilomar Conference on Signals, Systems and Computers, 2016

2015
Brill tagging on the Micron Automata Processor.
Proceedings of the 9th IEEE International Conference on Semantic Computing, 2015

Transient voltage noise in charge-recycled power delivery networks for many-layer 3D-IC.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Power-efficient embedded processing with resilience and real-time constraints.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Hardware overhead analysis of programmability in ARX crypto processing.
Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy, 2015

Association Rule Mining with the Micron Automata Processor.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Yield-aware Performance-Cost Characterization for Multi-Core SIMT.
Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

A cross-layer design exploration of charge-recycled power-delivery in many-layer 3d-IC.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Regular expression acceleration on the micron automata processor: Brill tagging as a case study.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014
BenchFriend: Correlating the performance of GPU benchmarks.
Int. J. High Perform. Comput. Appl., 2014

The resilience wall: Cross-layer solution strategies.
Proceedings of the Technical Papers of 2014 International Symposium on VLSI Design, 2014

SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Architecture implications of pads as a scarce resource.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Real-world design and evaluation of compiler-managed GPU redundant multithreading.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Dymaxion++: A Directive-Based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Characterization of transient error tolerance for a class of mobile embedded applications.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

A meta-algorithm for classification by feature nomination.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Flexibility and Circuit Overheads in Reconfigurable SIMD/MIMD Systems.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Walking Pads: Managing C4 Placement for Transient Voltage Noise Minimization.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Walking pads: Fast power-supply pad-placement optimization.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

Image classification by multi-kernel dictionary learning.
Proceedings of the 48th Asilomar Conference on Signals, Systems and Computers, 2014

2013
Implications of the Power Wall: Dim Cores and Reconfigurable Logic.
IEEE Micro, 2013

Evaluating Overheads of Multibit Soft-Error Protection in the Processor Core.
IEEE Micro, 2013

Trellis: Portability across architectures with a high-level framework.
J. Parallel Distributed Comput., 2013

Architectural implications of spatial thermal filtering.
Integr., 2013

Introducing the New Editor-in-Chief of the IEEE Computer Architecture Letters.
IEEE Comput. Archit. Lett., 2013

Binary Interval Search: a scalable algorithm for counting interval intersections.
Bioinform., 2013

Pannotia: Understanding irregular GPGPU graph applications.
Proceedings of the IEEE International Symposium on Workload Characterization, 2013

Load balancing in a changing world: dealing with heterogeneity and performance variability.
Proceedings of the Computing Frontiers Conference, 2013

2012
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors.
ACM Trans. Comput. Syst., 2012

Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up.
IEEE Micro, 2012

Recent thermal management techniques for microprocessors.
ACM Comput. Surv., 2012

ArchFP: Rapid prototyping of pre-RTL floorplans.
Proceedings of the 20th IEEE/IFIP International Conference on VLSI and System-on-Chip, 2012

Robust SIMD: Dynamically Adapted SIMD Width and Multi-Threading Depth.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels.
Proceedings of the 4th USENIX Workshop on Hot Topics in Parallelism, 2012

Scalable Manycore Computing with CUDA.
Fundamentals of Multicore Software Development, 2012

2011
Thermal benefit of multi-core floorplanning: A limits study.
Sustain. Comput. Informatics Syst., 2011

Scaling with Design Constraints: Predicting the Future of Big Chips.
IEEE Micro, 2011

A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations.
Int. J. Parallel Program., 2011

Editorial: Letter from the Editor-in-Chief.
IEEE Comput. Archit. Lett., 2011

Dymaxion: optimizing memory access patterns for heterogeneous systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

A reconfigurable simulator for large-scale heterogeneous multicore architectures.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Energy-efficient mechanisms for managing thread context in throughput processors.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Reducing the cost of redundant execution in safety-critical systems using relaxed dedication.
Proceedings of the Design, Automation and Test in Europe, 2011

Cost-effective safety and fault localization using distributed temporal redundancy.
Proceedings of the 14th International Conference on Compilers, 2011

2010
Predictive Temperature-Aware DVFS.
IEEE Trans. Computers, 2010

Federation: Boosting per-thread performance of throughput-oriented manycore architectures.
ACM Trans. Archit. Code Optim., 2010

The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches.
Proceedings of the Conference on High Performance Computing Networking, 2010

Dynamic warp subdivision for integrated branch and memory divergence tolerance.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Parallelization of Particle Filter Algorithms.
Proceedings of the Computer Architecture, 2010

Exploiting inter-thread temporal locality for chip multithreading.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Temperature-to-power mapping.
Proceedings of the 28th International Conference on Computer Design, 2010

Accelerating SQL database operations on a GPU with CUDA.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009
Letter from the Editor.
IEEE Comput. Archit. Lett., 2009

Increasing memory miss tolerance for SIMD cores.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Differentiating the roles of IR measurement and simulation for power and temperature-aware design.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Rodinia: A benchmark suite for heterogeneous computing.
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs.
Proceedings of the 23rd international conference on Supercomputing, 2009

Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling.
Proceedings of the 27th International Conference on Computer Design, 2009

2008
Accurate, Pre-RTL Temperature-Aware Design Using a Parameterized, Geometric Thermal Model.
IEEE Trans. Computers, 2008

On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance.
IEEE Trans. Computers, 2008

Scalable Parallel Programming with CUDA.
ACM Queue, 2008

A performance study of general-purpose applications on graphics processors using CUDA.
J. Parallel Distributed Comput., 2008

Accelerating Compute-Intensive Applications with GPUs and FPGAs.
Proceedings of the IEEE Symposium on Application Specific Processors, 2008

Federation: repurposing scalar cores for out-of-order instruction issue.
Proceedings of the 45th Design Automation Conference, 2008

Many-core design from a thermal perspective.
Proceedings of the 45th Design Automation Conference, 2008

Predictive design space exploration using genetically programmed response surfaces.
Proceedings of the 45th Design Automation Conference, 2008

Multi-mode energy management for multi-tier server clusters.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Interconnect Lifetime Prediction for Reliability-Aware Systems.
IEEE Trans. Very Large Scale Integr. Syst., 2007

Dynamic Voltage Scaling in Multitier Web Servers with End-to-End Delay Control.
IEEE Trans. Computers, 2007

Low-Power Design and Temperature Management.
IEEE Micro, 2007

Enhancing Energy Efficiency in Multi-tier Web Server Clusters via Prioritization.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors.
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware 2007, 2007

Impact of process variations on multicore performance symmetry.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

2006
HotSpot: A Compact Thermal Modeling Methodology for Early-Stage VLSI Design.
IEEE Trans. Very Large Scale Integr. Syst., 2006

Evaluating trace cache energy efficiency.
ACM Trans. Archit. Code Optim., 2006

Foreword.
IEEE Comput. Archit. Lett., 2006

A Novel Software Solution for Localized Thermal Problems.
Proceedings of the Parallel and Distributed Processing and Applications, 2006

CMP design space exploration subject to physical constraints.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware.
Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2006

Procrastinating voltage scheduling with discrete frequency sets.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Applications of Small-Scale Reconfigurability to Graphics Processors.
Proceedings of the Reconfigurable Computing: Architectures and Applications, 2006

Using Branch Prediction Information for Near-Optimal I-Cache Leakage.
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006

2005
Merging path and gshare indexing in perceptron branch prediction.
ACM Trans. Archit. Code Optim., 2005

Accelerated warmup for sampled microarchitecture simulation.
ACM Trans. Archit. Code Optim., 2005

Improved Thermal Management with Reliability Banking.
IEEE Micro, 2005

A Case for Thermal-Aware Floorplanning at the Microarchitectural Level.
J. Instr. Level Parallelism, 2005

Fine-grained graphics architectural simulation with Qsilver.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005

Studying Thermal Management for Graphics-Processor Architectures.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Power and thermal effects of SRAM vs. Latch-Mux design styles and clock gating choices.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

The need for a full-chip and package thermal model for thermally optimized IC designs.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Using Performance Counters for Runtime Temperature Sensing in High-Performance Processors.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Monitoring Temperature in FPGA based SoCs.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Analytical Model for Sensor Placement on Microprocessors.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Performance, Energy, and Thermal Considerations for SMT and CMP Architectures.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Topic 7 - Parallel Computer Architecture and ILP.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Optimal procrastinating voltage scheduling for hard real-time systems.
Proceedings of the 42nd Design Automation Conference, 2005

2004
Power-Aware Branch Prediction: Characterization and Design.
IEEE Trans. Computers, 2004

Temperature-aware microarchitecture: Modeling and implementation.
ACM Trans. Archit. Code Optim., 2004

Profile-based adaptation for cache decay.
ACM Trans. Archit. Code Optim., 2004

Implementing branch-predictor decay using quasi-static memory cells.
ACM Trans. Archit. Code Optim., 2004

Temperature-aware GPU design.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2004

Understanding the energy efficiency of simultaneous multithreading.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

A General Post-Processing Approach to Leakage Current Reduction in SRAM-Based FPGAs.
Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

Interconnect lifetime prediction under dynamic stress for reliability-aware design.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

A flexible simulation framework for graphics architectures.
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware 2004, 2004

Hybrid Architectural Dynamic Thermal Management.
Proceedings of the 2004 Design, 2004

State-Preserving vs. Non-State-Preserving Leakage Control in Caches.
Proceedings of the 2004 Design, 2004

Compact thermal modeling for temperature-aware design.
Proceedings of the 41th Design Automation Conference, 2004

2003
HotSpot: a dynamic compact thermal model at the processor-architecture level.
Microelectron. J., 2003

Temperature-Aware Computer Systems: Opportunities and Challenges.
IEEE Micro, 2003

Alloyed Branch History: Combining Global and Local Branch History for Robust Performance.
Int. J. Parallel Program., 2003

Guest Editors' Introduction: Power-Aware Computing.
Computer, 2003

Challenges in Computer Architecture Evaluation.
Computer, 2003

Power-aware QoS Management in Web Servers.
Proceedings of the 24th IEEE Real-Time Systems Symposium (RTSS 2003), 2003

Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

Temperature-Aware Microarchitecture.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Reducing Multimedia Decode Power using Feedback Control.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

2002
Implementing Decay Techniques using 4T Quasi-Static Memory Cells.
IEEE Comput. Archit. Lett., 2002

Teaching processor architecture with a VLSI perspective.
Proceedings of the 2002 workshop on Computer architecture education, 2002

A microprocessor survey course for learning advanced computer architecture.
Proceedings of the 33rd SIGCSE Technical Symposium on Computer Science Education, 2002

Odd/even bus invert with two-phase transfer for buses with coupling.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Managing leakage for transient data: decay and quasi-static 4T memory cells.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Applying Decay Strategies to Branch Predictors for Leakage Energy Savings.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Power Issues Related to Branch Prediction.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Control-theoretic dynamic frequency and voltage scaling for multimedia workloads.
Proceedings of the International Conference on Compilers, 2002

2001
The effects of context switching on branch predictor performance.
Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software, 2001

Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

2000
Speculative Updates of Local and Global Branch History: A Quantitative Analysis.
J. Instr. Level Parallelism, 2000

A microprocessor survey course: exploring advanced computer architecture in practice.
Proceedings of the 2000 workshop on Computer architecture education, 2000

A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions.
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques.
IEEE Trans. Computers, 1999

1998
Improving Prediction for Procedure Returns with Return-address-stack Repair Mechanisms.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Multipath Execution: Opportunities and Limits.
Proceedings of the 12th international conference on Supercomputing, 1998

1997
Design Issues and Tradeoffs for Write Buffers.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997


  Loading...