Trevor N. Mudge

Orcid: 0000-0001-7845-2187

  • University of Michigan, Ann Arbor, MI, USA

According to our database1, Trevor N. Mudge authored at least 309 papers between 1977 and 2025.

Collaborative distances:


ACM Fellow

ACM Fellow 2016, "For contributions to power aware computer architecture".

IEEE Fellow

IEEE Fellow 1995, "For contributions to the design and analysis of high performance processors.".



In proceedings 
PhD thesis 


Online presence:



DAP: A 507-GMACs/J 256-Core Domain Adaptive Processor for Wireless Communication and Linear Algebra Kernels in 12-nm FINFET.
IEEE J. Solid State Circuits, February, 2025

Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design.
CoRR, 2024

Demystifying Graph Sparsification Algorithms in Graph Properties Preservation.
Proc. VLDB Endow., November, 2023

Rethinking DRAM's Page Mode With STT-MRAM.
IEEE Trans. Computers, May, 2023

Introduction to the Special Issue on Domain-Specific System-on-Chip Architectures and Run-Time Management Techniques.
ACM Trans. Embed. Comput. Syst., March, 2023

Domain-Specific Architectures: Research Problems and Promising Approaches.
ACM Trans. Embed. Comput. Syst., March, 2023

Accelerating Graph Analytics on a Reconfigurable Architecture with a Data-Indirect Prefetcher.
CoRR, 2023

RecPIM: A PIM-Enabled DRAM-RRAM Hybrid Memory System For Recommendation Models.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

PEDAL: A Power Efficient GCN Accelerator with Multiple DAtafLows.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Versa: A 36-Core Systolic Multiprocessor With Dynamically Reconfigurable Interconnect and Memory.
IEEE J. Solid State Circuits, 2022

A 507 GMACs/J 256-Core Domain Adaptive Systolic-Array-Processor for Wireless Communication and Linear-Algebra Kernels in 12nm FINFET.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022

Mint: An Accelerator For Mining Temporal Motifs.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Improving Energy Efficiency of Convolutional Neural Networks on Multi-core Architectures through Run-time Reconfiguration.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

NDMiner: accelerating graph pattern mining using near data processing.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

MeNDA: a near-memory multi-way merge solution for sparse transposition and dataflows.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Squaring the circle: Executing Sparse Matrix Computations on FlexTPU - A TPU-Like Processor.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

Locality-Aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

CoDR: Computation and Data Reuse Aware CNN Accelerator.
CoRR, 2021

Versa: A Dataflow-Centric Multiprocessor with 36 Systolic ARM Cortex-M4F Cores and a Reconfigurable Crossbar-Memory Hierarchy in 28nm.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

A Deep Dive Into Understanding The Random Walk-Based Temporal Graph Learning.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

CoSPARSE: A Software and Hardware Reconfigurable SpMV Framework for Graph Analytics.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix-Matrix Multiplication Accelerator.
IEEE J. Solid State Circuits, 2020

CoPTA: Contiguous Pattern Speculating TLB Architecture.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2020

Accelerating Deep Neural Network Computation on a Low Power Reconfigurable Architecture.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Sparse-TPU: adapting systolic arrays for sparse matrices.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Accelerating Linear Algebra Kernels on a Massively Parallel Reconfigurable Architecture.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Configurable-ECC: Architecting a Flexible ECC Scheme to Support Different Sized Accesses in High Bandwidth Memory Systems.
IEEE Trans. Computers, 2019

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

SMART: STT-MRAM architecture for smart activation and sensing.
Proceedings of the International Symposium on Memory Systems, 2019

Fine-Grained Management of Thread Blocks for Irregular Applications.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019

Holistic generational offsets: Fostering a primitive online abstraction for human vs. machine cognition
CoRR, 2018

A load balancing technique for memory channels.
Proceedings of the International Symposium on Memory Systems, 2018

OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Impact of FinFET on Near-Threshold Voltage Scalability.
IEEE Des. Test, 2017

Regless: just-in-time operand staging for GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

A Programmable Galois Field Processor for the Internet of Things.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Accelerating Smith-Waterman Alignment Workload with Scalable Vector Computing.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant.
ACM Trans. Comput. Syst., 2016

Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems.
IEEE Trans. Computers, 2016

RATT-ECC: Rate Adaptive Two-Tiered Error Correction Codes for Reliable 3D Die-Stacked Memory.
ACM Trans. Archit. Code Optim., 2016

Impact of Future Technologies on Architecture.
IEEE Micro, 2016

Sirius Implications for Future Warehouse-Scale Computers.
IEEE Micro, 2016

Energy-Autonomous Wireless Communication for Millimeter-Scale Internet-of-Things Sensor Nodes.
IEEE J. Sel. Areas Commun., 2016

Checkpointing Exascale Memory Systems with Existing Memory Technologies.
Proceedings of the Second International Symposium on Memory Systems, 2016

Enhancing DRAM Self-Refresh for Idle Power Reduction.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

A low power software-defined-radio baseband processor for the Internet of Things.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Near-threshold computing in FinFET technologies: opportunities for improved voltage scalability.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Using Graphics Processing Units in an LTE Base Station.
J. Signal Process. Syst., 2015

Thoughts on Winning the 2014 Eckert-Mauchly Award.
IEEE Micro, 2015

The specialization trend in computer hardware: techincal perspective.
Commun. ACM, 2015

WarpPool: sharing requests with inter-warp coalescing for throughput processors.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

E-ECC: Low Power Erasure and Error Correction Schemes for Increasing Reliability of Commodity DRAM Systems.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

A study of mobile device utilization.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

The Architecture of Smart Phones.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

Improving the Reliability of MLC NAND Flash Memories Through Adaptive Data Refresh and Error Control Coding.
J. Signal Process. Syst., 2014

Evaluating private vs. shared last-level caches for energy efficiency in asymmetric multi-cores.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Hi-Rise: A High-Radix Switch for 3D Integration with Single-Cycle Arbitration.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Sources of error in full-system simulation.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A study of Thread Level Parallelism on mobile devices.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A memory rename table to reduce energy and improve performance.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

Author retrospective improving data cache performance by pre-executing instructions under a cache miss.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

A hybrid approach to offloading mobile image classification.
Proceedings of the IEEE International Conference on Acoustics, 2014

VIX: Virtual Input Crossbar for Efficient Switch Allocation.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Quality-of-Service for a High-Radix Switch.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Integrated 3D-stacked server designs for increasing physical density of key-value stores.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Limits of Parallelism and Boosting in Dim Silicon.
IEEE Micro, 2013

Centip3De: A 64-Core, 3D Stacked Near-Threshold System.
IEEE Micro, 2013

Centip3De: A Cluster-Based NTC Architecture With 64 ARM Cortex-M3 Cores in 3D Stacked 130 nm CMOS.
IEEE J. Solid State Circuits, 2013

Centip3De: a many-core prototype exploring 3D integration and near-threshold computing.
Commun. ACM, 2013

Architecting an LTE base station with graphics processing units.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2013

Exploring DRAM organizations for energy-efficient and resilient exascale memories.
Proceedings of the International Conference for High Performance Computing, 2013

Parallelization techniques for implementing trellis algorithms on graphics processors.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

WiBench: An open source kernel suite for benchmarking wireless systems.
Proceedings of the IEEE International Symposium on Workload Characterization, 2013

Scaling towards kilo-core processors with asymmetric high-radix topologies.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

A Customized Processor for Energy Efficient Scientific Computing.
IEEE Trans. Computers, 2012

Swizzle-Switch Networks for Many-Core Systems.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS.
Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

Centip3De: A 3930DMIPS/W configurable near-threshold 3D stacked system with 64 ARM Cortex-M3 cores.
Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

Swizzle Switch: A self-arbitrating high-radix crossbar for NoC systems.
Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), 2012

Process variation in near-threshold wide SIMD architectures.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

High radix self-arbitrating switch fabric with multiple arbitration schemes and quality of service.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Assessing the performance limits of parallelized near-threshold computing.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

A limits study of benefits from nanostore-based future data-centric system architectures.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

Lazy cache invalidation for self-modifying codes.
Proceedings of the 15th International Conference on Compilers, 2012

XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Analyzing the Next Generation Software Defined Radio for Future Architectures.
J. Signal Process. Syst., 2011

Flexible product code-based ECC schemes for MLC NAND Flash memories.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2011

Full-system analysis and characterization of interactive smartphone applications.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Bloom Filter Guided Transaction Scheduling.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Low power interconnects for SIMD computers.
Proceedings of the Design, Automation and Test in Europe, 2011

Sponge: portable stream programming on graphics engines.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

PEPSC: A Power-Efficient Processor for Scientific Computing.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

A Low-Power DSP for Wireless Communications.
IEEE Trans. Very Large Scale Integr. Syst., 2010

Yield-Driven Near-Threshold SRAM Design.
IEEE Trans. Very Large Scale Integr. Syst., 2010

Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits.
Proc. IEEE, 2010

AnySP: Anytime Anywhere Anyway Signal Processing.
IEEE Micro, 2010

Challenges and Opportunities for Extremely Energy-Efficient Processors.
IEEE Micro, 2010

Guest Editor's Introduction: Top Picks from the Computer Architecture Conferences of 2009.
IEEE Micro, 2010

Mobile Supercomputers for the Next-Generation Cell Phone.
Computer, 2010

Technologies for reducing power.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Diet SODA: a power-efficient processor for digital cameras.
Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

Evolution of thread-level parallelism in desktop applications.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Mighty-morphing power-SIMD.
Proceedings of the 2010 International Conference on Compilers, 2010

MacroSS: macro-SIMDization of streaming applications.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

MEDICS: ultra-portable processing for medical image reconstruction.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

A survey of multicore processors.
IEEE Signal Process. Mag., 2009

Server Designs for Warehouse-Computing Environments.
IEEE Micro, 2009

Integrating NAND flash devices onto servers.
Commun. ACM, 2009

Customizing wide-SIMD architectures for H.264.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

Reconfigurable Multicore Server Processors for Low Power Operation.
Proceedings of the Embedded Computer Systems: Architectures, 2009

Proactive transaction scheduling for contention management.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

SuiteSpecks and SuiteSpots: A methodology for the automatic conversion of benchmarking programs into intrinsically checkpointed assembly code.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

End-to-end performance forecasting: finding bottlenecks before they happen.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Disaggregated memory for expansion and sharing in blade servers.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Using non-volatile memory to save energy in servers.
Proceedings of the Design, Automation and Test in Europe, 2009

Stream Compilation for Real-Time Embedded Multicore Systems.
Proceedings of the CGO 2009, 2009

Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures.
Proceedings of the PACT 2009, 2009

Multi-Mechanism Reliability Modeling and Management in Dynamic Systems.
IEEE Trans. Very Large Scale Integr. Syst., 2008

On-chip cache device scaling limits and effective fault repair techniques in future nanoscale technology.
Microprocess. Microsystems, 2008

True Random Number Generator With a Metastability-Based Quality Control.
IEEE J. Solid State Circuits, 2008

PicoServer: Using 3D stacking technology to build energy efficient servers.
ACM J. Emerg. Technol. Comput. Syst., 2008

Energy-Efficient Simultaneous Thread Fetch from Different Cache Levels in a Soft Real-Time SMT Processor.
Proceedings of the Embedded Computer Systems: Architectures, 2008

PicoServer - building a compact energy efficient multiprocessor.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

A parameterized dataflow language extension for embedded streaming systems.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

From SODA to scotch: The evolution of a wireless baseband processor.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Reconfigurable energy efficient near threshold cache architectures.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Full-System Critical Path Analysis.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Improving NAND Flash Based Disk Caches.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Analyzing the scalability of SIMD for the next generation software defined radio.
Proceedings of the IEEE International Conference on Acoustics, 2008

SODA: A High-Performance DSP Architecture for Software-Defined Radio.
IEEE Micro, 2007

Design and Analysis of LDPC Decoders for Software Defined Radio.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2007

The Next Generation Challenge for Software Defined Radio.
Proceedings of the Embedded Computer Systems: Architectures, 2007

Energy efficient near-threshold chip multi-processing.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Yield-driven near-threshold SRAM design.
Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

Analysis of hardware prefetching across virtual page boundaries.
Proceedings of the 4th Conference on Computing Frontiers, 2007

Multicore architectures.
Proceedings of the 2007 International Conference on Compilers, 2007

Hierarchical coarse-grained stream compilation for software defined radio.
Proceedings of the 2007 International Conference on Compilers, 2007

An Energy Efficient Parallel Architecture Using Near Threshold Operation.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Guest editorial: Concurrent hardware and software design for multiprocessor SoC.
ACM Trans. Embed. Comput. Syst., 2006

A self-tuning DVS processor using delay-error detection and correction.
IEEE J. Solid State Circuits, 2006

Design and Implementation of Turbo Decoders for Software Defined Radio.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2006

Reducing idle mode power in software defined radio terminals.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

SODA: A Low-power Architecture For Software Radio.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Reliability modeling and management in dynamic microprocessor-based systems.
Proceedings of the 43rd Design Automation Conference, 2006

FlashCache: a NAND flash memory file cache for low power web servers.
Proceedings of the 2006 International Conference on Compilers, 2006

PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Quantitative analysis and optimization techniques for on-chip cache leakage power.
IEEE Trans. Very Large Scale Integr. Syst., 2005

Introduction to the Special Section on Energy Efficient Computing.
IEEE Trans. Computers, 2005

ChipLock: support for secure microarchitectures.
SIGARCH Comput. Archit. News, 2005

How to Fake 1000 Registers.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Error Analysis for the Support of Robust Voltage Scaling.
Proceedings of the 6th International Symposium on Quality of Electronic Design (ISQED 2005), 2005

Intrinsic Checkpointing: A Methodology for Decreasing Simulation Time Through Binary Modification.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

PowerFITS: Reduce Dynamic and Static I-Cache Power Using Application Specific Instruction Set Synthesis.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Total power-optimal pipelining and parallel processing under process variations in nanometer technology.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

An Intrusion-Tolerant and Self-Recoverable Network Service System Using A Security Enhanced Chip Multiprocessor.
Proceedings of the Second International Conference on Autonomic Computing (ICAC 2005), 2005

Software Defined Radio - A High Performance Embedded Challenge.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Total leakage optimization strategies for multi-level caches.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

DVS for On-Chip Bus Designs Based on Timing Error Correction.
Proceedings of the 2005 Design, 2005

Power-Performance Trade-Offs in Nanometer-Scale Multi-Level Caches Considering Total Leakage.
Proceedings of the 2005 Design, 2005

Grand challenges in embedded systems.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

Performance and power analysis of computer systems.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

A dual-processor solution for the MAC layer of a software defined radio terminal.
Proceedings of the 2005 International Conference on Compilers, 2005

Opportunities and challenges for better than worst-case design.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Circuit and microarchitectural techniques for reducing cache leakage power.
IEEE Trans. Very Large Scale Integr. Syst., 2004

Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation.
IEEE Micro, 2004

Mobile Supercomputers.
Computer, 2004

Making Typical Silicon Matter with Razor.
Computer, 2004

Reducing pipeline energy demands with local DVS and dynamic retiming.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

Microarchitectural power modeling techniques for deep sub-micron microprocessors.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

Low Power Robust Computing.
Proceedings of the High Performance Computing, 2004

Circuit-aware architectural simulation.
Proceedings of the 41th Design Automation Conference, 2004

FITS: framework-based instruction-set tuning synthesis for embedded application specific processors.
Proceedings of the 41th Design Automation Conference, 2004

Special issue on compilers, architecture, and synthesis for embedded systems.
ACM Trans. Embed. Comput. Syst., 2003

Leakage Current: Moore's Law Meets Static Power.
Computer, 2003

Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

The microarchitecture of a low power register file.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Reducing register ports using delayed write-back queues and operand pre-fetch.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

A 2.3Gb/s fully integrated and synthesizable AES Rijndael core.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2003

Leakage Current Reduction in VLSI Systems.
J. Circuits Syst. Comput., 2002

Vertigo: Automatic Performance-Setting for Linux.
Proceedings of the 5th Symposium on Operating System Design and Implementation (OSDI 2002), 2002

Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Drowsy Caches: Simple Techniques for Reducing Leakage Power.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads.
Proceedings of the 2002 IEEE/ACM International Conference on Computer-aided Design, 2002

Uniprocessor Virtual Memory without TLBs.
IEEE Trans. Computers, 2001

High-Performance DRAMs in Workstation Environments.
IEEE Trans. Computers, 2001

Power: A First-Class Architectural Design Constraint.
Computer, 2001

Automatic performance setting for dynamic voltage scaling.
Proceedings of the MOBICOM 2001, 2001

Integrating superscalar processor components to implement register caching.
Proceedings of the 15th international conference on Supercomputing, 2001

Collection and Analysis of Microprocessor Design Errors.
IEEE Des. Test Comput., 2000

The store-load address table and speculative register promotion.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Web latency reduction via client-side prefetching.
Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

The New DRAM Interfaces: SDRAM, RDRAM and Variants.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Reducing Code Size with Run-Time Decompression.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

Power: A First Class Design Constraint for Future Architecture and Automation.
Proceedings of the High Performance Computing, 2000

Thread Level Parallelism and Interactive Performance of Desktop Applications.
Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000

Timing verification of sequential dynamic circuits.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1999

The limits of instruction level parallelism in SPEC95 applications.
SIGARCH Comput. Archit. News, 1999

A high level simulator integrated with the Mirv compiler.
SIGARCH Comput. Archit. News, 1999

Performance Limits of Trace Caches.
J. Instr. Level Parallelism, 1999

Evaluation of a High Performance Code Compression Method.
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

A Performance Comparison of Contemporary DRAM Architectures.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

High-Level Test Generation for Design Verification of Pipelined Microprocessors.
Proceedings of the 36th Conference on Design Automation, 1999

Overview of complementary GaAs technology for high-speed VLSI circuits.
IEEE Trans. Very Large Scale Integr. Syst., 1998

High-level design verification of microprocessors via error modeling.
ACM Trans. Design Autom. Electr. Syst., 1998

Virtual memory in contemporary microprocessors.
IEEE Micro, 1998

Virtual Memory: Issues of Implementation.
Computer, 1998

Computer architecture instruction at the University of Michigan.
Proceedings of the 1998 workshop on Computer architecture education, 1998

The YAGS Branch Prediction Scheme.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

A Look at Several Memory Management Units, TLB-Refill Mechanisms, and Page Table Organizations.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

Trap-Driven Memory Simulation with Tapeworm II.
ACM Trans. Model. Comput. Simul., 1997

Multilevel Optimization of Pipelined Caches.
IEEE Trans. Computers, 1997

A Comment on "An Analytical Model for Designing Memory Hierarchies".
IEEE Trans. Computers, 1997

Trace-Driven Memory Simulation: A Survey.
ACM Comput. Surv., 1997

Improving Code Density Using Compression Techniques.
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

The bi-Mode Branch Predictor.
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

Improving Data Cache Performance by Pre-Executing Instructions Under a Cache Miss.
Proceedings of the 11th international conference on Supercomputing, 1997

Design Optimization for High-speed Per-address Two-level Branch Predictors.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

Instruction Prefetching Using Branch Prediction Information.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

Software-Managed Address Translation.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

An Analytical Model for Designing Memory Hierarchies.
IEEE Trans. Computers, 1996

Report on the panel: "how can computer architecture researchers avoid becoming the society for irreproducible results?".
SIGARCH Comput. Archit. News, 1996

Strategic Directions in Computer Architecture.
ACM Comput. Surv., 1996

The <i>trading function</i> in action.
Proceedings of the 7th ACM SIGOPS European Workshop: Systems Support for Worldwide Applications, 1996

Wrong-path Instruction Prefetching.
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Correlation and Aliasing in Dynamic Branch Predictors.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Timing verification of sequential domino circuits.
Proceedings of the 1996 IEEE/ACM International Conference on Computer-Aided Design, 1996

Analysis of Branch Prediction Via Data Compression.
Proceedings of the ASPLOS-VII Proceedings, 1996

Critical paths in circuits with level-sensitive latches.
IEEE Trans. Very Large Scale Integr. Syst., 1995

The role of adaptivity in two-level adaptive branch prediction.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Instruction Fetching: Coping with Code Bloat.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

A Parallel Genetic Algorithm for Multiobjective Microprocessor Design.
Proceedings of the 6th International Conference on Genetic Algorithms, 1995

Systematic objective-driven computer architecture optimization.
Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI '95), 1995

Design Tradeoffs for Software-Managed TLBs.
ACM Trans. Comput. Syst., 1994

Kernel-Based Memory Simulation.
Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1994

A comparison of two pipeline organizations.
Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994

IDtrace - A Tracing Tool for i486 Simulation.
Proceedings of the MASCOTS '94, Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems, January 31, 1994

Optimal Allocation of On-Chip Memory for Multiple-API Operating Systems.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

The Effect of Speculative Execution on Cache Performance.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

Resource Allocation in a High Clock Rate Microprocessor.
Proceedings of the ASPLOS-VI Proceedings, 1994

Trap-driven Simulation with Tapeworm II.
Proceedings of the ASPLOS-VI Proceedings, 1994

Gallium-arsenide process evaluation based on a RISC microprocessor example.
IEEE J. Solid State Circuits, October, 1993

Synchronization of pipelines.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1993

A microarchitectural performance evaluation of a 3.2 Gbyte/s microprocessor bus.
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

Analysis and design of latch-controlled synchronous digital circuits.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

Performance Optimization of Pipelined Primary Caches.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

Identification of critical paths in circuits with level-sensitive latches.
Proceedings of the 1992 IEEE/ACM International Conference on Computer-Aided Design, 1992

Multilevel optimization in the design of a high-performance GaAs microcomputer.
IEEE J. Solid State Circuits, May, 1991

The Design of a Microsupercomputer.
Computer, 1991

Implementing a Cache for a High-Performance GaAs Microprocessor.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

Optimal Clocking of Circular Pipelines.
Proceedings of the Proceedings 1991 IEEE International Conference on Computer Design: VLSI in Computer & Processors, 1991

Hierarchical Gate-Array Routing on a Hypercube Multiprocessor.
J. Parallel Distributed Comput., 1990

Cache coherence requirements for interprocess rendezvous.
Int. J. Parallel Program., 1990

Proceedings of the Working Group on Ada Performance Issues 1990, 1990

Parallel and distributed issues.
Proceedings of the Working Group on Ada Performance Issues 1990, 1990

The space problem.
Proceedings of the Working Group on Ada Performance Issues 1990, 1990

The time problem.
Proceedings of the Working Group on Ada Performance Issues 1990, 1990

Taxonomy of benchmarks.
Proceedings of the Working Group on Ada Performance Issues 1990, 1990

A rationale for the design and implementation of Ada benchmark programs.
Proceedings of the Working Group on Ada Performance Issues 1990, 1990

Recommendations and future trends.
Proceedings of the Working Group on Ada Performance Issues 1990, 1990

<i>check</i> T<sub>c</sub> and <i>min</i> T<sub>c</sub>: Timing Verification and Optimal Clocking of Synchronous Digtal Circuits.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 1990

Translation and Execution of Distributed Ada Programs: Is It Still Ada?
IEEE Trans. Software Eng., 1989

Hypercube supercomputers.
Proc. IEEE, 1989

Efficient Recognition of Partially Visible Objects Using a Logarithmic Complexity Matching Technique.
Int. J. Robotics Res., 1989

Analysis of Bus Hierarchies for Multiprocessors.
Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988

Ada on hypercube.
Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988

High performance hypercube communications.
Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988

Parallel branch and bound algorithms on hypercube multiprocessors.
Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, 1988

Instruction Level Timing Mechanisms for Accurate Real-Time Task Scheduling.
IEEE Trans. Computers, 1987

Timing Issues in the Distributed Execution of Ada Programs.
IEEE Trans. Computers, 1987

Automatic generation of salient features for the recognition of partially occluded parts.
Robotica, 1987

Vision Algorithms for Hypercube Machines.
J. Parallel Distributed Comput., 1987

Multiple Bus Architectures.
Computer, 1987

Units of distribution for distributed Ada.
Proceedings of the First International Workshop on Real-Time Ada Issues, 1987

Range image segmentation and surface parameter extraction for 3-D object recognition of industrial parts.
Proceedings of the 1987 IEEE International Conference on Robotics and Automation, Raleigh, North Carolina, USA, March 31, 1987

Two-dimensional partially visible object recognition using efficient multidimensional range queries.
Proceedings of the 1987 IEEE International Conference on Robotics and Automation, Raleigh, North Carolina, USA, March 31, 1987

Crosspoint Cache Architectures.
Proceedings of the International Conference on Parallel Processing, 1987

A Preliminary Investigation into Parallel Routing on a Hypercube Computer.
Proceedings of the 24th ACM/IEEE Design Automation Conference. Miami Beach, FL, USA, June 28, 1987

Solutions to the <i>n</i> Queens problem using tasking in Ada.
ACM SIGPLAN Notices, 1986

A Microprocessor-based Hypercube Supercomputer.
IEEE Micro, 1986

Analysis of Multiple-Bus Interconnection Networks.
J. Parallel Distributed Comput., 1986

Toward Real-Time Performance Benchmarks for Ada.
Commun. ACM, 1986

Instruction Level Mechanisms for Accurate Real-time Task Scheduling.
Proceedings of the 7th IEEE Real-Time Systems Symposium (RTSS '86), 1986

Architecture of a Hypercube Supercomputer.
Proceedings of the International Conference on Parallel Processing, 1986

A Semi-Markov Model for the Performance of Multiple-Bus Systems.
IEEE Trans. Computers, 1985

Recognizing Partially Occluded Parts.
IEEE Trans. Pattern Anal. Mach. Intell., 1985

Object-Based Computing and the Ada Programming Language.
Computer, 1985

Some problems in distributing real-time Ada programs across machines.
Proceedings of the 1985 Annual ACM SIGAda International Conference on Ada, 1985

Recognizing partially hidden objects.
Proceedings of the 1985 IEEE International Conference on Robotics and Automation, 1985

Using Ada as a programming language for robot-based manufacturing cells.
IEEE Trans. Syst. Man Cybern., 1984

A Class of Cellular Architectures to Support Physical Design Automation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1984

Memory Interference Models with Variable Connection Time.
IEEE Trans. Computers, 1984

Hierarchical decomposition and simulation of manufacturing cells.
Proceedings of the 16th conference on Winter simulation, 1984

Efficiency of Feature Dependent Algorithms for the Parallel Processing of Images.
Proceedings of the International Conference on Parallel Processing, 1983

Probabilistic analysis of a crossbar switch.
Proceedings of the 9th International Symposium on Computer Architecture (ISCA 1982), 1982

An Approximate Queueing Model for Packet Switched Multistage Interconnection Networks.
Proceedings of the Proceedings of the 3rd International Conference on Distributed Computing Systems, 1982

Cellular image processing techniques for VLSI circuit layout validation and routing.
Proceedings of the 19th Design Automation Conference, 1982

Review of The structure of computers and computation Vol. I by David J. Kuck. John Wiley & and Sons 1978.
SIGARCH Comput. Archit. News, 1980

A Computer Hardware Design Language for Multiprocessor Systems
PhD thesis, 1977
