Kevin Skadron

ACM Trans. Reconfigurable Technol. Syst., September, 2024

GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis.

[BibT_eX]

[DOI]

Alif Ahmed

Farzana Ahmed Siddique

MohammadHosein Gholamrezaei

Int. J. Parallel Program., June, 2024

Abakus: Accelerating <i>k</i>-mer Counting with Storage Technology.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2024

Swift: A Multi-FPGA Framework for Scaling Up Accelerated Graph Analytics.

[BibT_eX]

[DOI]

CoRR, 2024

ECG: Expressing Locality and Prefetching for Optimal Caching in Graph Structures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Architectural Modeling and Benchmarking for Digital DRAM PIM.

[BibT_eX]

[DOI]

Farzana Ahmed Siddique

Deyuan Guo

Zhenxing Fan

Proceedings of the IEEE International Symposium on Workload Characterization, 2024

2023

HashMem: PIM-based Hashmap Accelerator.

[BibT_eX]

[DOI]

CoRR, 2023

FreezeTime: Towards System Emulation through Architectural Virtualization.

[BibT_eX]

[DOI]

Sergiu Mosanu

Joshua Fixelle

Mohammad Nazmus Sakib

Mircea Stan

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

FreezeTime: Towards System Emulation through Architectural Virtualization.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

ACTS: A Near-Memory FPGA Graph Processing Framework.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

Hardware Trojans in eNVM Neuromorphic Devices.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

2022

Supporting Moderate Data Dependency, Position Dependency, and Divergence in PIM-Based Accelerators.

[BibT_eX]

[DOI]

Marzieh Lenjani

IEEE Micro, 2022

Synthesizing Legacy String Code for FPGAs Using Bounded Automata Learning.

[BibT_eX]

[DOI]

Kevin Angstadt

Tommy Tracy II

Jean-Baptiste Jeannin

Westley Weimer

IEEE Micro, 2022

Agile-AES: Implementation of configurable AES primitive with agile design approach.

[BibT_eX]

[DOI]

Integr., 2022

Deterministic vs. Non Deterministic Finite Automata in Automata Processing.

[BibT_eX]

[DOI]

Farzana Ahmed Siddique

Tommy James Tracy II

Nathan Brunelle

CoRR, 2022

DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2022

Pulley: An Algorithm/Hardware Co-Optimization for In-Memory Sorting.

[BibT_eX]

[DOI]

Marzieh Lenjani

Alif Ahmed

Amir Mahdi Hosseini Monazzah

IEEE Comput. Archit. Lett., 2022

Speculative Code Compaction: Eliminating Dead Code via Speculative Microcode Transformations.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Gearbox: a case for supporting accumulation dispatching and hybrid partitioning in PIM-based accelerators.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

PiMulator: a Fast and Flexible Processing-in-Memory Emulation Platform.

[BibT_eX]

[DOI]

Sergiu Mosanu

Mohammad Nazmus Sakib

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2021

ROCKY: A Robust Hybrid On-Chip Memory Kit for the Processors With STT-MRAM Cache Technology.

[BibT_eX]

[DOI]

Mahdi Talebi

Arash Salahvarzi

Amir Mahdi Hosseini Monazzah

Mahdi Fazeli

IEEE Trans. Computers, 2021

NOSTalgy: Near-Optimum Run-Time STT-MRAM Quality-Energy Knob Management for Approximate Computing Applications.

[BibT_eX]

[DOI]

Arash Salahvarzi

Mahdi Fazeli

Patricia Gonzalez-Guerrero

IEEE Trans. Computers, 2021

A Roadmap for Enabling a Future-Proof In-Network Computing Data Plane Ecosystem.

[BibT_eX]

[DOI]

CoRR, 2021

Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching Acceleration.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Sieve: Scalable In-situ DRAM-based Accelerator Designs for Massively Parallel k-mer Matching.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

BigMap: Future-proofing Fuzzers with Efficient Large Maps.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020

Towards on-node Machine Learning for Ultra-low-power Sensors Using Asynchronous Σ Δ Streams.

[BibT_eX]

[DOI]

ACM J. Emerg. Technol. Comput. Syst., 2020

Enabling In-SRAM Pattern Processing With Low-Overhead Reporting Architecture.

[BibT_eX]

[DOI]

Elaheh Sadredini

Reza Rahimi

Patricia Gonzalez-Guerrero

IEEE Comput. Archit. Lett., 2020

Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators.

[BibT_eX]

[DOI]

Marzieh Lenjani

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Runtime Verification on FPGAs with LTLf Specifications.

[BibT_eX]

[DOI]

Proceedings of the 2020 Formal Methods in Computer Aided Design, 2020

Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing Accelerators.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

MTTF Enhancement Power-C4 Bump Placement Optimization.

[BibT_eX]

[DOI]

Fakhrul Zaman Rokhani

IEEE Trans. Very Large Scale Integr. Syst., 2019

Automata Processing in Reconfigurable Architectures: In-the-Cloud Deployment, Cross-Platform Evaluation, and Fast Symbol-Only Reconfiguration.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2019

Portable Programming with RAPID.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Reco-Pi: A reconfigurable Cryptoprocessor for π-Cipher.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2019

A Scalable and Efficient In-Memory Interconnect Architecture for Automata Processing.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2019

eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Hopscotch: a micro-benchmark suite for memory performance evaluation.

[BibT_eX]

[DOI]

Alif Ahmed

Proceedings of the International Symposium on Memory Systems, 2019

GraphTinker: A High Performance Data Structure for Dynamic Graph Processing.

[BibT_eX]

[DOI]

Wole Jaiyeoba

Daniel Mueller-Gritschneder

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Cross-Layer Resilience: Challenges, Insights, and the Road Ahead.

[BibT_eX]

[DOI]

Eric Cheng

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Debugging Support for Pattern-Matching Languages and Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

Tolerating Soft Errors in Processor Cores Using CLEAR (Cross-Layer Exploration for Architecting Resilience).

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Hierarchical Pattern Mining with the Automata Processor.

[BibT_eX]

[DOI]

Ke Wang

Elaheh Sadredini

Int. J. Parallel Program., 2018

MNCaRT: An Open-Source, Multi-Architecture Automata-Processing Research and Execution Ecosystem.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2018

ASPEN: A Scalable In-SRAM Architecture for Pushdown Automata.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

AutomataZoo: A Modern Automata Processing Benchmark Suite.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Characterizing and Mitigating Output Reporting Bottlenecks in Spatial Automata Processing Architectures.

[BibT_eX]

[DOI]

Jack Wadden

Kevin Angstadt

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Searching for Potential gRNA Off-Target Sites for CRISPR/Cas9 Using Automata Processing Across Different Platforms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

Dual-Data Rate Transpose-Memory Architecture Improves the Performance, Power and Area of Signal-Processing Systems.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2017

Accelerating Weeder: A DNA Motif Search Tool Using the Micron Automata Processor and FPGA.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2017

Frequent subtree mining on the automata processor: challenges and opportunities.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2017

Pre-RTL Voltage and Power Optimization for Low-Cost, Thermally Challenged Multicore Chips.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Cross-Layer Resilience in Low-Voltage Digital Systems: Key Insights.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Very Low Voltage (VLV) Design.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Classifying images in a histopathological dataset using the cumulative distribution transform on an automata architecture.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing, 2017

REAPR: Reconfigurable engine for automata processing.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Automata-to-Routing: An Open-Source Toolchain for Design-Space Exploration of Spatial Automata Processing Architectures.

[BibT_eX]

[DOI]

Jack Wadden

Samira Manabi Khan

Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Acceleration of Frequent Itemset Mining on FPGA using SDAccel and Vivado HLS.

[BibT_eX]

[DOI]

Vinh Dang

Karthikeyan Sankaralingam

Proceedings of the 28th IEEE International Conference on Application-specific Systems, 2017

PPE-ARX: Area- and power-efficient VLIW programmable processing element for IoT crypto-systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 NASA/ESA Conference on Adaptive Hardware and Systems, 2017

2016

Tolerating the Consequences of Multiple EM-Induced C4 Bump Failures.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

Near-Memory Data Services.

[BibT_eX]

[DOI]

Cristian Estan

IEEE Micro, 2016

A 16-Bit Reconfigurable Encryption Processor for p-Cipher.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines and architectures.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Lumos+: Rapid, pre-RTL design space exploration on accelerator-rich heterogeneous architectures with reconfigurable logic.

[BibT_eX]

[DOI]

Liang Wang

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Generating efficient and high-quality pseudo-random behavior on Automata Processors.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Clear: cross-layer exploration for architecting resilience combining hardware and software techniques to tolerate soft errors in processor cores.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual Design Automation Conference, 2016

An overview of micron's automata processor.

[BibT_eX]

[DOI]

Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2016

Sequential pattern mining with the Micron automata processor.

[BibT_eX]

[DOI]

Ke Wang

Elaheh Sadredini

Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Entity resolution acceleration using the automata processor.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

RAPID Programming of Pattern-Recognition Processors.

[BibT_eX]

[DOI]

Kevin Angstadt

Westley Weimer

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

Feature extraction and image retrieval on an automata structure.

[BibT_eX]

[DOI]

Proceedings of the 50th Asilomar Conference on Signals, Systems and Computers, 2016

2015

Brill tagging on the Micron Automata Processor.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE International Conference on Semantic Computing, 2015

Transient voltage noise in charge-recycled power delivery networks for many-layer 3D-IC.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Power-efficient embedded processing with resilience and real-time constraints.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Hardware overhead analysis of programmability in ARX crypto processing.

[BibT_eX]

[DOI]

Mohamed El-Hadedy

Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy, 2015

Association Rule Mining with the Micron Automata Processor.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Yield-aware Performance-Cost Characterization for Multi-Core SIMT.

[BibT_eX]

[DOI]

Seyyed Hasan Mozafari

Brett H. Meyer

Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

A cross-layer design exploration of charge-recycled power-delivery in many-layer 3d-IC.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual Design Automation Conference, 2015

Regular expression acceleration on the micron automata processor: Brill tagging as a case study.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014

BenchFriend: Correlating the performance of GPU benchmarks.

[BibT_eX]

[DOI]

Shuai Che

Int. J. High Perform. Comput. Appl., 2014

The resilience wall: Cross-layer solution strategies.

[BibT_eX]

[DOI]

Proceedings of the Technical Papers of 2014 International Symposium on VLSI Design, 2014

SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Architecture implications of pads as a scarce resource.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Real-world design and evaluation of compiler-managed GPU redundant multithreading.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Dymaxion++: A Directive-Based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems.

[BibT_eX]

[DOI]

Shuai Che

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Characterization of transient error tolerance for a class of mobile embedded applications.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

A meta-algorithm for classification by feature nomination.

[BibT_eX]

[DOI]

Rituparna Sarkar

Scott T. Acton

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Flexibility and Circuit Overheads in Reconfigurable SIMD/MIMD Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Walking Pads: Managing C4 Placement for Transient Voltage Noise Minimization.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Walking pads: Fast power-supply pad-placement optimization.

[BibT_eX]

[DOI]

Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

Image classification by multi-kernel dictionary learning.

[BibT_eX]

[DOI]

Proceedings of the 48th Asilomar Conference on Signals, Systems and Computers, 2014

2013

Implications of the Power Wall: Dim Cores and Reconfigurable Logic.

[BibT_eX]

[DOI]

Liang Wang

IEEE Micro, 2013

Evaluating Overheads of Multibit Soft-Error Protection in the Processor Core.

[BibT_eX]

[DOI]

Lukasz G. Szafaryn

Brett H. Meyer

IEEE Micro, 2013

Trellis: Portability across architectures with a high-level framework.

[BibT_eX]

[DOI]

Lukasz G. Szafaryn

Todd Gamblin

Bronis R. de Supinski

J. Parallel Distributed Comput., 2013

Architectural implications of spatial thermal filtering.

[BibT_eX]

[DOI]

Integr., 2013

Introducing the New Editor-in-Chief of the IEEE Computer Architecture Letters.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2013

Binary Interval Search: a scalable algorithm for counting interval intersections.

[BibT_eX]

[DOI]

Bioinform., 2013

Pannotia: Understanding irregular GPGPU graph applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2013

Load balancing in a changing world: dealing with heterogeneity and performance variability.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, 2013

2012

A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2012

Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up.

[BibT_eX]

[DOI]

IEEE Micro, 2012

Recent thermal management techniques for microprocessors.

[BibT_eX]

[DOI]

Joonho Kong

ACM Comput. Surv., 2012

ArchFP: Rapid prototyping of pre-RTL floorplans.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE/IFIP International Conference on VLSI and System-on-Chip, 2012

Robust SIMD: Dynamically Adapted SIMD Width and Multi-Threading Depth.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels.

[BibT_eX]

[DOI]

Proceedings of the 4th USENIX Workshop on Hot Topics in Parallelism, 2012

Scalable Manycore Computing with CUDA.

[BibT_eX]

[DOI]

Michael Garland

Vinod Grover

Fundamentals of Multicore Software Development, 2012

2011

Thermal benefit of multi-core floorplanning: A limits study.

[BibT_eX]

[DOI]

Brett H. Meyer

Sustain. Comput. Informatics Syst., 2011

Scaling with Design Constraints: Predicting the Future of Big Chips.

[BibT_eX]

[DOI]

IEEE Micro, 2011

A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2011

Editorial: Letter from the Editor-in-Chief.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2011

Dymaxion: optimizing memory access patterns for heterogeneous systems.

[BibT_eX]

[DOI]

Shuai Che

Proceedings of the Conference on High Performance Computing Networking, 2011

Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

A reconfigurable simulator for large-scale heterogeneous multicore architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Energy-efficient mechanisms for managing thread context in throughput processors.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Reducing the cost of redundant execution in safety-critical systems using relaxed dedication.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

Cost-effective safety and fault localization using distributed temporal redundancy.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Compilers, 2011

2010

Predictive Temperature-Aware DVFS.

[BibT_eX]

[DOI]

Jong Sung Lee

IEEE Trans. Computers, 2010

Federation: Boosting per-thread performance of throughput-oriented manycore architectures.

[BibT_eX]

[DOI]

Michael Boyer

ACM Trans. Archit. Code Optim., 2010

The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Dynamic warp subdivision for integrated branch and memory divergence tolerance.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Parallelization of Particle Filter Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Computer Architecture, 2010

Exploiting inter-thread temporal locality for chip multithreading.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Temperature-to-power mapping.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computer Design, 2010

Accelerating SQL database operations on a GPU with CUDA.

[BibT_eX]

[DOI]

Peter Bakkum

Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009

Letter from the Editor.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2009

Increasing memory miss tolerance for SIMD cores.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Differentiating the roles of IR measurement and simulation for power and temperature-aware design.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Rodinia: A benchmark suite for heterogeneous computing.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 23rd international conference on Supercomputing, 2009

Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computer Design, 2009

2008

Accurate, Pre-RTL Temperature-Aware Design Using a Parameterized, Geometric Thermal Model.

[BibT_eX]

[DOI]

Robert J. Ribando

IEEE Trans. Computers, 2008

On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2008

Scalable Parallel Programming with CUDA.

[BibT_eX]

[DOI]

ACM Queue, 2008

A performance study of general-purpose applications on graphics processors using CUDA.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2008

Accelerating Compute-Intensive Applications with GPUs and FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on Application Specific Processors, 2008

Federation: repurposing scalar cores for out-of-order instruction issue.

[BibT_eX]

[DOI]

Michael Boyer

Proceedings of the 45th Design Automation Conference, 2008

Many-core design from a thermal perspective.

[BibT_eX]

[DOI]

Robert J. Ribando

Proceedings of the 45th Design Automation Conference, 2008

Predictive design space exploration using genetically programmed response surfaces.

[BibT_eX]

[DOI]

Henry Cook

Proceedings of the 45th Design Automation Conference, 2008

Multi-mode energy management for multi-tier server clusters.

[BibT_eX]

[DOI]

Tibor Horvath

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

Interconnect Lifetime Prediction for Reliability-Aware Systems.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2007

Dynamic Voltage Scaling in Multitier Web Servers with End-to-End Delay Control.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2007

Low-Power Design and Temperature Management.

[BibT_eX]

[DOI]

IEEE Micro, 2007

Enhancing Energy Efficiency in Multi-tier Web Server Clusters via Prioritization.

[BibT_eX]

[DOI]

Tibor Horvath

Tarek F. Abdelzaher

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware 2007, 2007

Impact of process variations on multicore performance symmetry.

[BibT_eX]

[DOI]

Eric Humenay

Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

2006

HotSpot: A Compact Thermal Modeling Methodology for Early-Stage VLSI Design.

[BibT_eX]

[DOI]

Shougata Ghosh

IEEE Trans. Very Large Scale Integr. Syst., 2006

Evaluating trace cache energy efficiency.

[BibT_eX]

[DOI]

Michele Co

Dee A. B. Weikle

ACM Trans. Archit. Code Optim., 2006

Foreword.

[BibT_eX]

[DOI]

Jean-Luc Gaudiot

Yale N. Patt

IEEE Comput. Archit. Lett., 2006

A Novel Software Solution for Localized Thermal Problems.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing and Applications, 2006

CMP design space exploration subject to physical constraints.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2006

Procrastinating voltage scheduling with discrete frequency sets.

[BibT_eX]

[DOI]

Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Applications of Small-Scale Reconfigurability to Graphics Processors.

[BibT_eX]

[DOI]

Proceedings of the Reconfigurable Computing: Architectures and Applications, 2006

Using Branch Prediction Information for Near-Optimal I-Cache Leakage.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006

2005

Merging path and gshare indexing in perceptron branch prediction.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2005

Accelerated warmup for sampled microarchitecture simulation.

[BibT_eX]

[DOI]

John W. Haskins Jr.

ACM Trans. Archit. Code Optim., 2005

Improved Thermal Management with Reliability Banking.

[BibT_eX]

[DOI]

IEEE Micro, 2005

A Case for Thermal-Aware Floorplanning at the Microarchitectural Level.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2005

Fine-grained graphics architectural simulation with Qsilver.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2005

Studying Thermal Management for Graphics-Processor Architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Power and thermal effects of SRAM vs. Latch-Mux design styles and clock gating choices.

[BibT_eX]

[DOI]

Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

The need for a full-chip and package thermal model for thermally optimized IC designs.

[BibT_eX]

[DOI]

Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Using Performance Counters for Runtime Temperature Sensing in High-Performance Processors.

[BibT_eX]

[DOI]

Kyeong-Jae Lee

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Monitoring Temperature in FPGA based SoCs.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Analytical Model for Sensor Placement on Microprocessors.

[BibT_eX]

[DOI]

Kyeong-Jae Lee

Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Performance, Energy, and Thermal Considerations for SMT and CMP Architectures.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Topic 7 - Parallel Computer Architecture and ILP.

[BibT_eX]

[DOI]

Theo Ungerer

Josep Lluís Larriba-Pey

Pedro Trancoso

Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Optimal procrastinating voltage scheduling for hard real-time systems.

[BibT_eX]

[DOI]

Proceedings of the 42nd Design Automation Conference, 2005

2004

Power-Aware Branch Prediction: Characterization and Design.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2004

Temperature-aware microarchitecture: Modeling and implementation.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

Profile-based adaptation for cache decay.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

Implementing branch-predictor decay using quasi-static memory cells.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

Temperature-aware GPU design.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2004

Understanding the energy efficiency of simultaneous multithreading.

[BibT_eX]

[DOI]

Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

A General Post-Processing Approach to Leakage Current Reduction in SRAM-Based FPGAs.

[BibT_eX]

[DOI]

John C. Lach

Jason Brandon

Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

Interconnect lifetime prediction under dynamic stress for reliability-aware design.

[BibT_eX]

[DOI]

Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

A flexible simulation framework for graphics architectures.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware 2004, 2004

Hybrid Architectural Dynamic Thermal Management.

[BibT_eX]

[DOI]

Proceedings of the 2004 Design, 2004

State-Preserving vs. Non-State-Preserving Leakage Control in Caches.

[BibT_eX]

[DOI]

Yingmin Li

Dharmesh Parikh

Yan Zhang

Proceedings of the 2004 Design, 2004

Compact thermal modeling for temperature-aware design.

[BibT_eX]

[DOI]

Shougata Ghosh

Proceedings of the 41th Design Automation Conference, 2004

2003

HotSpot: a dynamic compact thermal model at the processor-architecture level.

[BibT_eX]

[DOI]

Microelectron. J., 2003

Temperature-Aware Computer Systems: Opportunities and Challenges.

[BibT_eX]

[DOI]

IEEE Micro, 2003

Alloyed Branch History: Combining Global and Local Branch History for Robust Performance.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2003

Guest Editors' Introduction: Power-Aware Computing.

[BibT_eX]

[DOI]

Computer, 2003

Challenges in Computer Architecture Evaluation.

[BibT_eX]

[DOI]

Computer, 2003

Power-aware QoS Management in Web Servers.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE Real-Time Systems Symposium (RTSS 2003), 2003

Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation.

[BibT_eX]

[DOI]

John W. Haskins Jr.

Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

Temperature-Aware Microarchitecture.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Reducing Multimedia Decode Power using Feedback Control.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

2002

Implementing Decay Techniques using 4T Quasi-Static Memory Cells.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2002

Teaching processor architecture with a VLSI perspective.

[BibT_eX]

[DOI]

Proceedings of the 2002 workshop on Computer architecture education, 2002

A microprocessor survey course for learning advanced computer architecture.

[BibT_eX]

[DOI]

Proceedings of the 33rd SIGCSE Technical Symposium on Computer Science Education, 2002

Odd/even bus invert with two-phase transfer for buses with coupling.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Managing leakage for transient data: decay and quasi-static 4T memory cells.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Applying Decay Strategies to Branch Predictors for Leakage Energy Savings.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management.

[BibT_eX]

[DOI]

Tarek F. Abdelzaher

Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Power Issues Related to Branch Prediction.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Control-theoretic dynamic frequency and voltage scaling for multimedia workloads.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Compilers, 2002

2001

The effects of context switching on branch predictor performance.

[BibT_eX]

[DOI]

Michele Co

Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software, 2001

Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State.

[BibT_eX]

[DOI]

John W. Haskins Jr.

Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

2000

Speculative Updates of Local and Global Branch History: A Quantitative Analysis.

[BibT_eX]

[DOI]

Margaret Martonosi

Douglas W. Clark

J. Instr. Level Parallelism, 2000

A microprocessor survey course: exploring advanced computer architecture in practice.

[BibT_eX]

[DOI]

Proceedings of the 2000 workshop on Computer architecture education, 2000

A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions.

[BibT_eX]

[DOI]

Margaret Martonosi

Douglas W. Clark

Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999

Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

1998

Improving Prediction for Procedure Returns with Return-address-stack Repair Mechanisms.

[BibT_eX]

[DOI]

Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Multipath Execution: Opportunities and Limits.

[BibT_eX]

[DOI]

Proceedings of the 12th international conference on Supercomputing, 1998

1997

Design Issues and Tradeoffs for Write Buffers.

[BibT_eX]

[DOI]