Kunle Olukotun

Orcid: 0000-0002-8779-0636

Affiliations:
  • Stanford University, USA


According to our database1, Kunle Olukotun authored at least 182 papers between 1987 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2006, "For contributions to multiprocessors on a chip and multi threaded processor design.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts.
CoRR, 2024

Implementing and Optimizing the Scaled Dot-Product Attention on Streaming Dataflow.
CoRR, 2024

Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

The Dataflow Abstract Machine Simulator Framework.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Revet: A Language and Compiler for Dataflow Threads.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023
Artifact for Mosaic: An Interoperable Compiler for Tensor Algebra.
Dataset, March, 2023

Mosaic: An Interoperable Compiler for Tensor Algebra.
Proc. ACM Program. Lang., 2023

Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

The Sparse Abstract Machine.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

BaCO: A Fast and Portable Bayesian Compiler Optimization Framework.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Sigma: Compiling Einstein Summations to Locality-Aware Dataflow.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture.
CoRR, 2022

Efficient Memory Partitioning in Software Defined Hardware.
CoRR, 2022

Global perspectives of diversity, equity, and inclusion.
Commun. ACM, 2022

Accelerating SLIDE: Exploiting Sparsity on Accelerator Architectures.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Taurus: a data plane architecture for per-packet ML.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
Compilation of sparse array programming models.
Proc. ACM Program. Lang., 2021

Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators.
IEEE Comput. Archit. Lett., 2021

Bayesian Optimization with a Prior for the Optimum.
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2021

Capstan: A Vector RDA for Sparsity.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Aurochs: An Architecture for Dataflow Threads.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

SARA: Scaling a Reconfigurable Dataflow Accelerator.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

High performance lattice regression on FPGAs via a high level hardware description language.
Proceedings of the International Conference on Field-Programmable Technology, 2021

"Let the Data Flow!".
Proceedings of the 11th Conference on Innovative Data Systems Research, 2021

2020
Prior-guided Bayesian Optimization.
CoRR, 2020

Taurus: An Intelligent Data Plane.
CoRR, 2020

Gorgon: Accelerating Machine Learning from Relational Data.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark.
ACM SIGOPS Oper. Syst. Rev., 2019

DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning.
CoRR, 2019

Efficient Multiway Hash Join on Reconfigurable Hardware.
Proceedings of the Performance Evaluation and Benchmarking for the Era of Cloud(s), 2019

Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

HyperMapper: a Practical Design Space Exploration Framework.
Proceedings of the 27th IEEE International Symposium on Modeling, 2019

Practical Design Space Exploration.
Proceedings of the 27th IEEE International Symposium on Modeling, 2019

Scalable interconnects for reconfigurable spatial architectures.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Polystore++: Accelerated Polystore System for Heterogeneous Workloads.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

TensorFlow to Cloud FPGAs: Tradeoffs for Accelerating Deep Neural Networks.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Elastic RSS: Co-Scheduling Packets and Cores Using Programmable NICs.
Proceedings of the 3rd Asia-Pacific Workshop on Networking, 2019

2018
Plasticine: A Reconfigurable Accelerator for Parallel Patterns.
IEEE Micro, 2018

High-Accuracy Low-Precision Training.
CoRR, 2018

Exploring the Utility of Developer Exhaust.
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018

Spatial: a language and compiler for application accelerators.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

LevelHeaded: A Unified Engine for Business Intelligence and Linear Algebra Querying.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

2017
EmptyHeaded: A Relational Engine for Graph Processing.
ACM Trans. Database Syst., 2017

Mind the Gap: Bridging Multi-Domain Query Workloads with EmptyHeaded.
Proc. VLDB Endow., 2017

LevelHeaded: Making Worst-Case Optimal Joins Work in the Common Case.
CoRR, 2017

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark.
CoRR, 2017

Infrastructure for Usable Machine Learning: The Stanford DAWN Project.
CoRR, 2017

Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Plasticine: A Reconfigurable Architecture For Parallel Paterns.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016
EmptyHeaded: A Relational Engine for Graph Processing.
Proceedings of the 2016 International Conference on Management of Data, 2016

Automatic Generation of Efficient Accelerators for Reconfigurable Hardware.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Old techniques for new join algorithms: A case study in RDF processing.
Proceedings of the 32nd IEEE International Conference on Data Engineering Workshops, 2016

GraphOps: A Dataflow Library for Graph Analytics Acceleration.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Generating Configurable Hardware from Parallel Patterns.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

Scaling Data Analytics with Moore's Law.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
EmptyHeaded: Boolean Algebra Based Graph Processing.
CoRR, 2015

Energy-Efficient Abundant-Data Computing: The N3XT 1, 000x.
Computer, 2015

Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems.
Proceedings of the 1st Summit on Advances in Programming Languages, 2015

Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Automatic support for multi-module parallelism from computational patterns.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

EMEURO: a framework for generating multi-purpose accelerators via deep learning.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

2014
Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages.
ACM Trans. Embed. Comput. Syst., 2014

Guest Editorial.
Int. J. Parallel Program., 2014

Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems.
CoRR, 2014

Beyond parallel programming with domain specific languages.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Surgical precision JIT compilers.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Locality-Aware Mapping of Nested Parallel Patterns on GPUs.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Author's retrospective for: improving the performance of speculatively parallel applications on the hydra CMP.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Hardware system synthesis from Domain-Specific Languages.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Hardware acceleration of database operations.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Simplifying Scalable Graph Processing with a Domain-Specific Language.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013
On fast parallel detection of strongly connected components (SCC) in small-world graphs.
Proceedings of the International Conference for High Performance Computing, 2013

Optimizing data structures in high-level programs: new directions for extensible compilers based on staging.
Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2013

Forge: generating a high performance DSL implementation from a declarative specification.
Proceedings of the Generative Programming: Concepts and Experiences, 2013

Composition and Reuse with Compiled Domain-Specific Languages.
Proceedings of the ECOOP 2013 - Object-Oriented Programming, 2013

2012
Utilizing Static Analysis and Code Generation to Accelerate Neural Networks.
Proceedings of the 29th International Conference on Machine Learning, 2012

High performance embedded domain specific languages.
Proceedings of the ACM SIGPLAN International Conference on Functional Programming, 2012

A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

Green-Marl: a DSL for easy and efficient graph analysis.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
Implementing Domain-Specific Languages for Heterogeneous Parallel Computing.
IEEE Micro, 2011

Building-Blocks for Performance Oriented DSLs
Proceedings of the Proceedings IFIP Working Conference on Domain-Specific Languages, 2011

Accelerating CUDA graph algorithms at maximum warp.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

A domain-specific approach to heterogeneous parallelism.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

Panel Statement.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning.
Proceedings of the 28th International Conference on Machine Learning, 2011

Runtime automatic speculative parallelization.
Proceedings of the CGO 2011, 2011

Hardware acceleration of transactional memory on commodity systems.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

A Heterogeneous Parallel Framework for Domain-Specific Languages.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford.
IEEE Micro, 2010

Implementing and evaluating nested parallel transactions in software transactional memory.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

A practical concurrent binary search tree.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Transactional predication: high-performance concurrent sets and maps for STM.
Proceedings of the 29th Annual ACM Symposium on Principles of Distributed Computing, 2010

Language virtualization for heterogeneous parallel computing.
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2010

Chip multiprocessor architecture: A programmability-driven approach.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Eigenbench: A simple exploration tool for orthogonal TM characteristics.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Making nested parallel transactions practical using lightweight hardware support.
Proceedings of the 24th International Conference on Supercomputing, 2010

Implementing and Evaluating a Model Checker for Transactional Memory Systems.
Proceedings of the 15th IEEE International Conference on Engineering of Complex Computer Systems, 2010

Extreme scale computing: Challenges and opportunities.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

A Large-Scale Architecture for Restricted Boltzmann Machines.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

Hardware/software co-design for high performance computing: challenges and opportunities.
Proceedings of the 8th International Conference on Hardware/Software Codesign and System Synthesis, 2010

2009
Feedback-directed barrier optimization in a strongly isolated STM.
Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2009

The stanford pervasive parallelism lab.
Proceedings of the 2009 IEEE Hot Chips 21 Symposium (HCS), 2009

A highly scalable Restricted Boltzmann Machine FPGA implementation.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

2008
Improving software concurrency with hardware-assisted memory snapshot.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Ased: availability, security, and debugging support usingtransactional memory.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

STAMP: Stanford Transactional Applications for Multi-Processing.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

2007
iChip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01720-9, 2007

Transactional Memory: The Hardware-Software Interface.
IEEE Micro, 2007

Towards soft optimization techniques for parallel cognitive applications.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Transactional collection classes.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

An effective hybrid transactional memory system with strong isolation guarantees.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

A Scalable, Non-blocking Approach to Transactional Memory.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

A practical FPGA-based framework for novel CMP research.
Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007

ATLAS: a chip-multiprocessor with transactional memory support.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

The OpenTM Transactional Application Programming Interface.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Executing Java programs with transactional memory.
Sci. Comput. Program., 2006

The Identity Management Kalman Filter (IMKF).
Proceedings of the Robotics: Science and Systems II, 2006

The Atomos transactional programming language.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

Map-Reduce for Machine Learning on Multicore.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Architectural Semantics for Practical Transactional Memory.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

The common case transactional behavior of multithreaded programs.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Tradeoffs in transactional memory virtualization.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Testing implementations of transactional memory.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST).
SIGARCH Comput. Archit. News, 2005

The future of microprocessors.
ACM Queue, 2005

Niagara: A 32-Way Multithreaded Sparc Processor.
IEEE Micro, 2005

Exposing speculative thread parallelism in SPEC2000.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

The Information-Form Data Association Filter.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

An Application Analysis Framework For Polymorphic Chip Multiprocessors.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

TAPE: a transactional application profiling environment.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

A New Approach to Programming and Prototyping Parallel Systems.
Proceedings of the High Performance Computing, 2005

Characterization of TCC on Chip-Multiprocessors.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

Maximizing CMP Throughput with Mediocre Cores.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004
Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software.
IEEE Micro, 2004

Transactional Memory Coherence and Consistency.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Programming with transactional coherence and consistency (TCC).
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003
The Jrpm System for Dynamically Parallelizing Sequential Java Programs.
IEEE Micro, 2003

Using thread-level speculation to simplify manual parallelization.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

The Jrpm System for Dynamically Parallelizing Java Programs.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

TEST: A Tracer for Extracting Speculative Thread.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
Targeting Dynamic Compilation for Embedded Environments.
Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium, 2002

Efficient state representation for symbolic simulation.
Proceedings of the 39th Design Automation Conference, 2002

2001
High Bandwidth On-Chip Cache Design.
IEEE Trans. Computers, 2001

2000
The Stanford Hydra CMP.
IEEE Micro, 2000

1999
Improving the performance of speculatively parallel applications on the Hydra CMP.
Proceedings of the 13th international conference on Supercomputing, 1999

JMTP: an architecture for exploiting concurrency in embedded Java applications with real-time considerations.
Proceedings of the 1999 IEEE/ACM International Conference on Computer-Aided Design, 1999

1998
DCP: an algorithm for datapath/control partitioning of synthesizable RTL models.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

REMARC: Reconfigurable Multimedia Array Coprocessor (Abstract).
Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, 1998

A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications.
Proceedings of the 6th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '98), 1998

Digital System Simulation: Methodologies and Examples.
Proceedings of the 35th Conference on Design Automation, 1998

Data Speculation Support for a Chip Multiprocessor.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

Exploiting Method-Level Parallelism in Single-Threaded Java Programs.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997
Multilevel Optimization of Pipelined Caches.
IEEE Trans. Computers, 1997

A Single-Chip Multiprocessor.
Computer, 1997

Designing High Bandwidth On-Chip Caches.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Verifying correct pipeline implementation for microprocessors.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

Java as a specification language for hardware-software systems.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

The Hierarchical Multi-Bank DRAM: A High-Performance Architecture for Memory Integrated with Processors.
Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97), 1997

1996
Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Evaluation of Design Alternatives for a Multiprocessor Microprocessor.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

The Impact of Shared-Cache Clustering in Small-Scale Shared-Memory Multiprocessors.
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996

A Scalable Formal Verification Methodology for Pipelined Microprocessors.
Proceedings of the 33st Conference on Design Automation, 1996

The Case for a Single-Chip Multiprocessor.
Proceedings of the ASPLOS-VII Proceedings, 1996

1995
The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

A General Method for Compiling Event-Driven Simulations.
Proceedings of the 32st Conference on Design Automation, 1995

1994
A software-hardware cosynthesis approach to digital system simulation.
IEEE Micro, 1994

Exploring the Design Space for a Shared-Cache Multiprocessor.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

1992
Analysis and design of latch-controlled synchronous digital circuits.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1992

Performance Optimization of Pipelined Primary Caches.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

1991
Technology-organization tradeoffs in the architecture of a high-performance processor.
PhD thesis, 1991

The Design of a Microsupercomputer.
Computer, 1991

Implementing a Cache for a High-Performance GaAs Microprocessor.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

1990
Hierarchical Gate-Array Routing on a Hypercube Multiprocessor.
J. Parallel Distributed Comput., 1990

<i>check</i> T<sub>c</sub> and <i>min</i> T<sub>c</sub>: Timing Verification and Optimal Clocking of Synchronous Digtal Circuits.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 1990

1987
A Preliminary Investigation into Parallel Routing on a Hypercube Computer.
Proceedings of the 24th ACM/IEEE Design Automation Conference. Miami Beach, FL, USA, June 28, 1987


  Loading...