Alexandru Nicolau

Marco Bodrato

ACM Trans. Math. Softw., 2011

Improving accuracy for matrix multiplications on GPUs.

[BibT_eX]

[DOI]

Matthew Badin

Michael B. Dillencourt

Sci. Program., 2011

Improving the Accuracy of High Performance BLAS Implementations Using Adaptive Blocked Algorithms.

[BibT_eX]

[DOI]

Matthew Badin

Michael B. Dillencourt

Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

Pruning hardware evaluation space via correlation-driven application similarity analysis.

[BibT_eX]

[DOI]

Proceedings of the 8th Conference on Computing Frontiers, 2011

2010

On the efficacy of call graph-level thread-level speculation.

[BibT_eX]

[DOI]

Proceedings of the first joint WOSP/SIPEW International Conference on Performance Engineering, 2010

How Many Threads to Spawn during Program Multithreading?

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2010

Pretty Good Accuracy in Matrix Multiplication with GPUs.

[BibT_eX]

[DOI]

Matthew Badin

Michael B. Dillencourt

Proceedings of the Ninth International Symposium on Parallel and Distributed Computing, 2010

Exploitation of nested thread-level speculative parallelism on multi-core systems.

[BibT_eX]

[DOI]

Proceedings of the 7th Conference on Computing Frontiers, 2010

2009

Adaptive Winograd's matrix multiplications.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2009

On the exploitation of loop-level parallelism in embedded applications.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2009

A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors.

[BibT_eX]

[DOI]

Jeffrey L. Krichmar

Neural Networks, 2009

Brain Derived Vision Algorithm on High Performance Architectures.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2009

Optimizing control flow in loops using interval and dependence analysis.

[BibT_eX]

[DOI]

Des. Autom. Embed. Syst., 2009

Cache-aware partitioning of multi-dimensional iteration spaces.

[BibT_eX]

[DOI]

Proceedings of of SYSTOR 2009: The Israeli Experimental Systems Conference 2009, 2009

Performance Characterization of Itanium® 2-Based Montecito Processor.

[BibT_eX]

[DOI]

Cameron McNairy

Proceedings of the Computer Performance Evaluation and Benchmarking, 2009

Techniques for efficient placement of synchronization primitives.

[BibT_eX]

[DOI]

Guangqiang Li

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Efficient simulation of large-scale Spiking Neural Networks using CUDA graphics processors.

[BibT_eX]

[DOI]

Jeffrey L. Krichmar

Proceedings of the International Joint Conference on Neural Networks, 2009

Synchronization optimizations for efficient execution on multi-cores.

[BibT_eX]

[DOI]

Guangqiang Li

Proceedings of the 23rd international conference on Supercomputing, 2009

Efficient Scheduling of Nested Parallel Loops on Multi-Core Systems.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2009, 2009

2008

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2008

Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® CoreTM 2 Duo processor.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

Cache-aware iteration space partitioning.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Impact of JVM superoperators on energy consumption in resource-constrained embedded systems.

[BibT_eX]

[DOI]

Carmen Badea

Proceedings of the 2008 ACM SIGPLAN/SIGBED Conference on Languages, 2008

Control flow optimization in loops using interval analysis.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Compilers, 2008

2007

A predictive decode filter cache for reducing power consumption in embedded processors.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2007

Automatic Design Space Exploration of Register Bypasses in Embedded Processors.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

DYNAMO: A Cross-Layer Framework for End-to-End QoS and Energy Optimization in Mobile Handheld Devices.

[BibT_eX]

[DOI]

Shivajit Mohapatra

IEEE J. Sel. Areas Commun., 2007

R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks.

[BibT_eX]

[DOI]

Algorithmica, 2007

Comparative characterization of SPEC CPU2000 and CPU2006 on Itanium architecture.

[BibT_eX]

[DOI]

Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Tight analysis of the performance potential of thread speculation using spec CPU 2006.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Novel Brain-Derived Algorithms Scale Linearly with Number of Processing Elements.

[BibT_eX]

Jeff Furlong

Andrew Felch

Ashok Chandrashekar

Richard Granger

Proceedings of the Parallel Computing: Architectures, 2007

Annotation Integration and Trade-off Analysis for Multimedia Applications.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Adaptive Strassen's matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

A simplified java bytecode compilation system for resource-constrained embedded processors.

[BibT_eX]

[DOI]

Carmen Badea

Proceedings of the 2007 International Conference on Compilers, 2007

Short-Circuit Compiler Transformation: Optimizing Conditional Blocks.

[BibT_eX]

[DOI]

Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

2006

Retargetable pipeline hazard detection for partially bypassed processors.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2006

Energy efficient watermarking on mobile devices using proxy-based partitioning.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2006

Expression equivalence checking using interval analysis.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2006

Compilation framework for code size reduction using reduced bit-width ISAs (rISAs).

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2006

PBPAIR: an energy-efficient error-resilient encoding using probability based power aware intra refresh.

[BibT_eX]

[DOI]

ACM SIGMOBILE Mob. Comput. Commun. Rev., 2006

A general approach for partitioning N-dimensional parallel nested loops with conditionals.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30, 2006

Bypass aware instruction scheduling for register file power reduction.

[BibT_eX]

[DOI]

Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, 2006

Video Stream Annotations for Energy Trade-offs in Multimedia Applications.

[BibT_eX]

[DOI]

Proceedings of the 5th International Symposium on Parallel and Distributed Computing (ISPDC 2006), 2006

On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Lightweight lock-free synchronization methods for multithreading.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

History-aware Self-Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Probablistic Self-Scheduling.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Annotation Based Multimedia Streaming Over Wireless Networks.

[BibT_eX]

[DOI]

Proceedings of the 2006 4th Workshop on Embedded Systems for Real-Time Multimedia, 2006

Automatic generation of operation tables for fast exploration of bypasses in embedded processors.

[BibT_eX]

[DOI]

Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Software annotations for power optimization on mobile devices.

[BibT_eX]

[DOI]

Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Challenges in exploitation of loop parallelism in embedded applications.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

2005

Line Size Adaptivity Analysis of Parameterized Loop Nests for Direct Mapped Data Cache.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2005

A novel approach for partitioning iteration spaces with variable densities.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

An Efficient Approach for Self-scheduling Parallel Loops on Multiprogrammed Parallel Computers.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2005

An Efficient Load Balancing Scheme for Grid-based High Performance Scientific Computing.

[BibT_eX]

[DOI]

Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

Using a Way Cache to Improve Performance of Set-Associative Caches.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Enhanced Loop Coalescing: A Compiler Technique for Transforming Non-uniform Iteration Spaces.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Using Recursion to Boost ATLAS's Performance.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

A Cross-Layer Approach for Power-Performance Optimization in Distributed Mobile Systems.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Probability Based Power Aware Error Resilient Coding.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Distributed Computing Systems Workshops (ICDCS 2005 Workshops), 2005

Energy Analysis of Multimedia Watermarking on Mobile Handheld Devices.

[BibT_eX]

[DOI]

Proceedings of the 2005 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005

High performance annotation-aware JVM for Java cards.

[BibT_eX]

[DOI]

Ana Azevedo

Proceedings of the EMSOFT 2005, 2005

PBExplore: A Framework for Compiler-in-the-Loop Exploration of Partial Bypassing in Embedded Processors.

[BibT_eX]

[DOI]

Proceedings of the 2005 Design, 2005

Aggregating processor free time for energy reduction.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

Equivalence checking of arithmetic expressions using fast evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2005 International Conference on Compilers, 2005

2004

Coordinated parallelizing compiler optimizations and high-level synthesis.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2004

Using global code motions to improve the quality of results for high-level synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Caching Values in the Load Store Queue.

[BibT_eX]

[DOI]

Proceedings of the 12th International Workshop on Modeling, 2004

A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for High Performance Computing, 2004

JuliusC: A Practical Approach for the Analysis of Divide-and-Conquer Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for High Performance Computing, 2004

Interconnect-Aware Mapping of Applications to Coarse-Grain Reconfigurable Architectures.

[BibT_eX]

[DOI]

Proceedings of the Field Programmable Logic and Application, 2004

Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow.

[BibT_eX]

[DOI]

Proceedings of the 2004 Design, 2004

Network Topology Exploration of Mesh-Based Coarse-Grain Reconfigurable Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2004 Design, 2004

Proxy-based task partitioning of watermarking algorithms for reducing energy consumption in mobile devices.

[BibT_eX]

[DOI]

Proceedings of the 41th Design Automation Conference, 2004

Operation tables for scheduling in the presence of incomplete bypassing.

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

2003

RTGEN-an algorithm for automatic generation of reservation tables from architectural descriptions.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2003

Access pattern-based memory and connectivity architecture exploration.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2003

SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on VLSI Design (VLSI Design 2003), 2003

Integrated power management for video streaming to mobile handheld devices.

[BibT_eX]

[DOI]

Proceedings of the Eleventh ACM International Conference on Multimedia, 2003

A Data Cache with Dynamic Mapping.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2003

Reducing data cache energy consumption via cached load/store queue.

[BibT_eX]

[DOI]

Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

FORGE: A Framework for Optimization of Distributed Embedded Systems Software.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Interface Synthesis using Memory Mapping for an FPGA Platform.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors.

[BibT_eX]

[DOI]

Proceedings of the 2003 Design, 2003

Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs.

[BibT_eX]

[DOI]

Proceedings of the 2003 Design, 2003

Low Energy Associative Data Caches for Embedded Systems.

[BibT_eX]

[DOI]

Proceedings of the Embedded Software for SoC, 2003

Memory architecture exploration for programmable embedded systems.

[BibT_eX]

Kluwer, ISBN: 978-1-4020-7324-3, 2003

2002

Automatic Modeling and Validation of Pipeline Specifications Driven by an Architecture Description Language.

[BibT_eX]

[DOI]

Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

A Design Space Exploration Framework for Reduced Bit-Width Instruction Set Architecture (rISA) Design .

[BibT_eX]

[DOI]

Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002

Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002

Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 4th International Symposium, 2002

Power Savings in Embedded Processors through Decode Filer Cache.

[BibT_eX]

[DOI]

Proceedings of the 2002 Design, 2002

Automatic Verification of In-Order Execution In Microprocessors with Fragmented Pipelines and Multicycle Functional Units.

[BibT_eX]

[DOI]

Proceedings of the 2002 Design, 2002

An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAs.

[BibT_eX]

[DOI]

Proceedings of the 2002 Design, 2002

Memory System Connectivity Exploration.

[BibT_eX]

[DOI]

Proceedings of the 2002 Design, 2002

Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints.

[BibT_eX]

[DOI]

Proceedings of the 2002 Design, 2002

Coordinated transformations for high-level synthesis of high performance microprocessor blocks.

[BibT_eX]

[DOI]

Proceedings of the 39th Design Automation Conference, 2002

2001

V-SAT: A visual specification and analysis tool for system-on-chip exploration.

[BibT_eX]

[DOI]

J. Syst. Archit., 2001

Data Memory Organization and Optimizations in Application-Specific Systems.

[BibT_eX]

[DOI]

IEEE Des. Test Comput., 2001

Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance.

[BibT_eX]

[DOI]

Gianfranco Bilardi

Proceedings of the Algorithm Engineering, 2001

Processor-Memory Co-Exploration driven by a Memory-Aware Architecture Description Language.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Functional abstraction driven design space exploration of heterogeneous programmable architectures.

[BibT_eX]

[DOI]

Prabhat Mishra

Proceedings of the 14th International Symposium on Systems Synthesis, 2001

Conditional speculation and its effects on performance and area for high-level snthesis.

[BibT_eX]

[DOI]

Proceedings of the 14th International Symposium on Systems Synthesis, 2001

APEX: Access Pattern Based Memory Architecture Exploration.

[BibT_eX]

[DOI]

Proceedings of the 14th International Symposium on Systems Synthesis, 2001

Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Automatic validation of pipeline specifications.

[BibT_eX]

[DOI]

Prabhat Mishra

Proceedings of the Sixth IEEE International High-Level Design Validation and Test Workshop 2001, 2001

Access pattern based local memory customization for low power embedded systems.

[BibT_eX]

[DOI]

Proceedings of the Conference on Design, Automation and Test in Europe, 2001

Speculation Techniques for High Level Synthesis of Control Intensive Designs.

[BibT_eX]

[DOI]

Proceedings of the 38th Design Automation Conference, 2001

New directions in compiler technology for embedded systems (embedded tutorial).

[BibT_eX]

[DOI]

Proceedings of ASP-DAC 2001, 2001

2000

On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2000

The Design of the PROMIS Compiler-Towards Multi-Level Parallelization.

[BibT_eX]

[DOI]

Hideki Saito

Nicholas Stavrakos

Int. J. Parallel Program., 2000

An annotation-aware Java virtual machine implementation.

[BibT_eX]

[DOI]

Ana Azevedo

Concurr. Pract. Exp., 2000

Compiler-Directed Cache Assist Adaptivity.

[BibT_eX]

[DOI]

Xiaomei Ji

Proceedings of the High Performance Computing, Third International Symposium, 2000

Compiler-Directed Cache Line Size Adaptivity.

[BibT_eX]

[DOI]

Xiaomei Ji

Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Aggressive Memory-Aware Compilation.

[BibT_eX]

[DOI]

Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Customizing Software Toolkits for Embedded Systems-On-Chip.

[BibT_eX]

Ashok Halambi

Proceedings of the Architecture and Design of Distributed Embedded Systems, 2000

Using profiling to reduce branch misprediction costs on a dynamically scheduled processor.

[BibT_eX]

[DOI]

Srinivas Mantripragada

Proceedings of the 14th international conference on Supercomputing, 2000

MIST: An Algorithm for Memory Miss Traffic Management.

[BibT_eX]

[DOI]

Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

Architecture Exploration of Parameterizable EPIC SOC Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2000 Design, 2000

Memory aware compilation through accurate timing extraction.

[BibT_eX]

[DOI]

Proceedings of the 37th Conference on Design Automation, 2000

1999

Local memory exploration and optimization in embedded systems.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1999

Augmenting Loop Tiling with Data Alignment for Improved Cache Performance.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

Symbolic Analysis in the PROMIS Compiler.

[BibT_eX]

[DOI]

Nicholas Stavrakos

Steven Carroll

Hideki Saito

Proceedings of the Languages and Compilers for Parallel Computing, 1999

Java Annotation-Aware Just-in-Time (AJIT) Complilation System.

[BibT_eX]

[DOI]

Ana Azevedo

Proceedings of the ACM 1999 Conference on Java Grande, JAVA '99, San Francisco, CA, USA, 1999

Adapting cache line size to application behavior.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

EXPRESSION: A Language for Architecture Exploration through Compiler/Simulator Retargetability.

[BibT_eX]

[DOI]

Proceedings of the 1999 Design, 1999

The Design of the PROMIS Compiler.

[BibT_eX]

[DOI]

Hideki Saito

Nicholas Stavrakos

Steven Carroll

Proceedings of the Compiler Construction, 8th International Conference, 1999

1998

Incorporating DRAM access modes into high-level synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1998

Editor's Announcement.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1998

Copy Elimination for Parallelizing Compilers.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1998

Analyzing the Individual/Combined Effects of Speculative and Guarded Execution on a Superscalar Architecture.

[BibT_eX]

[DOI]

M. Srinivas

Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Data Cache Sizing for Embedded Processor Applications.

[BibT_eX]

[DOI]

Proceedings of the 1998 Design, 1998

1997

Memory data organization for improved cache performance in embedded processor applications.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 1997

Annotating the Java Bytecodes in Support of Optimization.

[BibT_eX]

[DOI]

Concurr. Pract. Exp., 1997

Resource Directed Loop Pipelining: Exposing Just Enough Parallelism.

[BibT_eX]

[DOI]

Comput. J., 1997

A Systematic Approach to Branch Speculation.

[BibT_eX]

[DOI]

Gianfranco Bilardi

Proceedings of the Languages and Compilers for Parallel Computing, 1997

Architectural Exploration and Optimization of Local Memory in Embedded Systems.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on System Synthesis, 1997

Achieving Multi-level Parallelization.

[BibT_eX]

[DOI]

Carrie J. Brownhill

Proceedings of the High Performance Computing, International Symposium, 1997

Improving cache Performance Through Tiling and Data Alignment.

[BibT_eX]

[DOI]

Proceedings of the Solving Irregularly Structured Problems in Parallel, 1997

A Data Alignment Technique for Improving Cache Performance.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

Exploiting off-chip memory access modes in high-level synthesis.

[BibT_eX]

[DOI]

Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

Efficient utilization of scratch-pad memory in embedded processor applications.

[BibT_eX]

[DOI]

Proceedings of the European Design and Test Conference, 1997

The PROMIS Compiler Prototype.

[BibT_eX]

[DOI]

Carrie J. Brownhill

Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996

Computing Programs Containing Band Linear Recurrences on Vector Supercomputers.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1996

Optimal register assignment to loops for embedded code generation.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 1996

Elimination of redundant memory traffic in high-level synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1996

The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints.

[BibT_eX]

[DOI]

Kai-Yeung Siu

IEEE Trans. Computers, 1996

Resource-Directed Loop Pipelining.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1996

Memory Organization for Improved Data Cache Performance in Embedded Processors.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on System Synthesis, 1996

A Method for Register Allocation to Loops in Multiple Register File Architectures.

[BibT_eX]

[DOI]

Proceedings of IPPS '96, 1996

An efficient, global resource-directed approach to exploiting instruction-level parallelism.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995

Performance evaluation for application-specific architectures.

[BibT_eX]

[DOI]

Jie Gong

Daniel D. Gajski

IEEE Trans. Very Large Scale Integr. Syst., 1995

Resource-Constrained Software Pipelining.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1995

A hierarchical approach to instruction-level parallelization.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1995

A hypergraph-based model for port allocation on multiple-register-file VLIW architectures.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1995

Path Collection and Dependence Testing in the Presence of Dynamic, Pointer-Based Data Structures.

[BibT_eX]

[DOI]

Proceedings of the Languages, 1995

A Simple Mechanism for Improving the Accuracy and Efficiency of Instruction-Level Disambiguation.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1995

Incorporating compiler feedback into the design of ASIPs.

[BibT_eX]

[DOI]

Frederick Onion

Proceedings of the 1995 European Design and Test Conference, 1995

1994

From the guest editors.

[BibT_eX]

[DOI]

Wen-mei W. Hwu

Int. J. Parallel Program., 1994

Editors' introduction.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1994

Ultra Fine-Grain Template-Driven Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Conference on VLSI Design, 1994

A General Data Dependence Test for Dynamic, Pointer-Based Data Structures.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation (PLDI), 1994

Mutation Scheduling: A Unified Approach to Compiling for Fine-Grain Parallelism.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1994

Scalable Techniques for Computing Band Linear Recurrences on Massively Parallel and Vector Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Parallel Processing, 1994

A Language for Conveying the Aliasing Properties of Dynamic, Pointer-Based Data Structures.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Parallel Processing, 1994

An Approach to Combine Predicated/Speculative Execution for Programs with Unpredictable Branches.

[BibT_eX]

[DOI]

Mantipragada Srinivas

Vicki H. Allan

Proceedings of the Parallel Architectures and Compilation Techniques, 1994

Partitioning of Variables for Multiple-Register-File Architectures via Hypergraph Coloring.

[BibT_eX]

[DOI]

Proceedings of the Parallel Architectures and Compilation Techniques, 1994

A Framework for Data Dependence Testing in the Presence of Pointers.

[BibT_eX]

[DOI]

Proceedings of the 1994 International Conference on Parallel Processing, 1994

Partitioning of Variables for Multiple-Register-File VLIW Architectures.

[BibT_eX]

[DOI]

Proceedings of the 1994 International Conference on Parallel Processing, 1994

Integrating program transformations in the memory-based synthesis of image and video algorithms.

[BibT_eX]

[DOI]

Proceedings of the 1994 IEEE/ACM International Conference on Computer-Aided Design, 1994

A performance evaluator for parameterized ASIC architectures.

[BibT_eX]

[DOI]

Jie Gong

Daniel D. Gajski

Proceedings of the Proceedings EURO-DAC'94, 1994

A Unified code generation approach using mutation scheduling.

[BibT_eX]

Proceedings of the Code Generation for Embedded Processors [Dagstuhl Workshop, Dagstuhl, Germany, August 31, 1994

Minimization of Memory Traffic in High-Level Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 31st Conference on Design Automation, 1994

1993

Automatic program parallelization.

[BibT_eX]

[DOI]

Proc. IEEE, 1993

A Mapping Strategy for MIMD Computers.

[BibT_eX]

[DOI]

Jiyuan Yang

Int. J. High Speed Comput., 1993

Massive Parallelism and Fine-Grain Parallelism: are They Incompatible?

[BibT_eX]

[DOI]

Int. J. High Speed Comput., 1993

Harmonic Scheduling: A Technique for Scheduling Beyond Loop-Carried Dependencies.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on VLSI Design, 1993

VISTA: The Visual Interface for Scheduling Transformations and Analysis.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1993

Trailblazing: A Hierarchical Approach to Percolation Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 1993 International Conference on Parallel Processing, 1993

Regular schedules for scalable design of IIR filters.

[BibT_eX]

[DOI]

Proceedings of the European Design Automation Conference 1993, 1993

High-Level Synthesis of Scalable Architectures for IIR Filters using Multichip Modules.

[BibT_eX]

[DOI]

Proceedings of the 30th Design Automation Conference. Dallas, 1993

1992

Abstract Description of Pointer Data Structures: An Approach for Improving the Analysis and Optimization of Imperative Programs.

[BibT_eX]

[DOI]

LOPLAS, 1992

Abstractions for Recursive Pointer Data Structures: Improving the Analysis of Imperative Programs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN'92 Conference on Programming Language Design and Implementation (PLDI), 1992

Partitioned register files for VLIWs: a preliminary analysis of tradeoffs.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992

A Hierarchical Parallelizing Compiler for VLIW/MIMD Machines.

[BibT_eX]

[DOI]

Carrie J. Brownhill

Proceedings of the Languages and Compilers for Parallel Computing, 1992

Speedup of band linear recurrences in the presence of resource constraints.

[BibT_eX]

[DOI]

Proceedings of the 6th international conference on Supercomputing, 1992

An Efficient Global Resource Constrained Technique for Exploiting Instruction Level Parallelism.

[BibT_eX]

Proceedings of the 1992 International Conference on Parallel Processing, 1992

Applying an Abstract Data Structure Description Approach to Parallelizing Scientific Pointer Programs.

[BibT_eX]

Proceedings of the 1992 International Conference on Parallel Processing, 1992

Harmonic scheduling of linear recurrences for digital filter design.

[BibT_eX]

[DOI]

Proceedings of the conference on European design automation, 1992

1991

Optimal Schedules for Parallel Prefix Computation with Bounded Resources.

[BibT_eX]

[DOI]

Proceedings of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1991

A New Technique for Induction Variable Removal.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1991

Parallelizing Tightly Nested Loops.

[BibT_eX]

[DOI]

Ki-Chang Kim

Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

How Do We Make Parallel Processing a Reality? Bridging the Gap Between Theory and Practice.

[BibT_eX]

Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

A Percolation Based VLIW Architecture.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1991

Incremental Tree Height Reduction for High Level Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 28th Design Automation Conference, 1991

1990

Parallelizing Programs with Recursive Data Structures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1990

Static Scheduling for Dynamic Dataflow Machines.

[BibT_eX]

[DOI]

Micah Beck

Keshav Pingali

J. Parallel Distributed Comput., 1990

Realistic scheduling: compaction for pipelined architectures.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1990

Parallelizing Non-Vectorizable Loops for MIMD Machines.

[BibT_eX]

Ki-Chang Kim

Proceedings of the 1990 International Conference on Parallel Processing, 1990

Percolation Based Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

1989

Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1989

Adaptive Bitonic Sorting: An Optimal Parallel Algorithm for Shared-Memory Machines.

[BibT_eX]

[DOI]

Gianfranco Bilardi

SIAM J. Comput., 1989

Intererence analysis tools for parallelizing programs with recursive data structures.

[BibT_eX]

[DOI]

Proceedings of the 3rd international conference on Supercomputing, 1989

A global resource-constrained parallelization technique.

[BibT_eX]

[DOI]

Kemal Ebcioglu

Proceedings of the 3rd international conference on Supercomputing, 1989

1988

A Development Environment for Horizontal Microcode.

[BibT_eX]

[DOI]

IEEE Trans. Software Eng., 1988

Fine-grain compilation for pipelined machines.

[BibT_eX]

[DOI]

Keshav Pingali

J. Supercomput., 1988

Loop Quantization: A Generalized Loop Unwinding Technique.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1988

Optimal Loop Parallelization.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN'88 Conference on Programming Language Design and Implementation (PLDI), 1988

Perfect Pipelining: A New Loop Parallelization Technique.

[BibT_eX]

[DOI]

Proceedings of the ESOP '88, 1988

1987

Loop Quantization or Unwinding Done Right.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing, 1987

1986

A development environment for horizontal microcode programs.

[BibT_eX]

[DOI]

Proceedings of the 19th annual workshop on Microprogramming, 1986

Getting High Performance with Slow Memory.

[BibT_eX]

Kevin Karplus

Proceedings of the Spring COMPCON'86, 1986

1985

Efficient hardware for multiway jumps and pre-fetches.

[BibT_eX]

[DOI]

Kevin Karplus

Proceedings of the 18th annual workshop on Microprogramming, 1985

Uniform Parallelism Exploitation in Ordinary Programs.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1985

1984

Measuring the Parallelism Available for Very Long Instruction Word Architectures.

[BibT_eX]

[DOI]

Joseph A. Fisher

IEEE Trans. Computers, 1984

Parallel processing: a smart compiler and a dumb machine.

[BibT_eX]

[DOI]

Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction, 1984

Parallel processing: a smart compiler and a dumb machine (with retrospective)

[BibT_eX]

[DOI]

Proceedings of the 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999, 1984

1983

Comparison of Compacting Algorithms for Garbage Collection.

[BibT_eX]

[DOI]

Jacques Cohen

ACM Trans. Program. Lang. Syst., 1983

1981

Using an oracle to measure potential parallelism in single instruction stream programs.

[BibT_eX]

[DOI]