Alexandru Nicolau

Orcid: 0009-0003-9833-8455

Affiliations:
  • University of California, Irvine, USA


According to our database1, Alexandru Nicolau authored at least 284 papers between 1981 and 2024.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2015, "For contributions to compiler technology and electronic design automation".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Hyperdimensional computing: a framework for stochastic computation and symbolic AI.
J. Big Data, December, 2024

NGLIC: A Nonaligned-Row Legalization Approach for 3-D Interdie Connection.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., February, 2024

Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join Queries.
Proc. ACM Manag. Data, 2024

Always-Sparse Training by Growing Connections with Guided Stochastic Exploration.
CoRR, 2024

Molecular Classification Using Hyperdimensional Graph Classification.
Proceedings of the International Joint Conference on Neural Networks, 2024

Enhanced Detection of Transdermal Alcohol Levels Using Hyperdimensional Computing on Embedded Devices.
Proceedings of the International Joint Conference on Neural Networks, 2024

2023
Torchhd: An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector Symbolic Architectures.
J. Mach. Learn. Res., 2023

Enhancing the Privacy of Machine Learning via faster arithmetic over Torus FHE.
IACR Cryptol. ePrint Arch., 2023

HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing.
CoRR, 2023

Accelerating Permute and N-Gram Operations for Hyperdimensional Learning in Embedded Systems.
Proceedings of the 29th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2023

DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Using Hyperdimensional Computing to Extract Features for the Detection of Type 2 Diabetes.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

RefineHD: Accurate and Efficient Single-Pass Adaptive Learning Using Hyperdimensional Computing.
Proceedings of the IEEE International Conference on Rebooting Computing, 2023

An Extension to Basis-Hypervectors for Learning from Circular Data in Hyperdimensional Computing.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
Torchhd: An Open-Source Python Library to Support Hyperdimensional Computing Research.
CoRR, 2022

A Heterogeneous Solution to the All-pairs Shortest Path Problem using FPGAs.
Proceedings of the 23rd International Symposium on Quality Electronic Design, 2022

GraphHD: Efficient graph classification using hyperdimensional computing.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Hyperdimensional hashing: a robust and efficient dynamic hash table.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
Detecting COVID-19 Related Pneumonia On CT Scans Using Hyperdimensional Computing.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

Class-Modeling of Septic Shock With Hyperdimensional Computing.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

Lightning Talks of EduHPC 2021.
Proceedings of the 9th IEEE/ACM Workshop on Education for High Performance Computing, 2021

2020
NumbaSummarizer: A Python Library for Simplified Vectorization Reports.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

2019
MCompiler: A Synergistic Compilation Framework.
CoRR, 2019

Teaching Parallel Computing and Dependence Analysis with Python.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

AFFIX: Automatic Acceleration Framework for FPGA Implementation of OpenVX Vision Algorithms.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018
An empirical study of the effect of source-level loop transformations on compiler stability.
Proc. ACM Program. Lang., 2018

OpenCV.js: computer vision processing for the open web platform.
Proceedings of the 9th ACM Multimedia Systems Conference, 2018

Towards an Achievable Performance for the Loop Nests.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

New Opportunities for Compilers in Computer Security.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

Acceleration Framework for FPGA Implementation of OpenVX Graph Pipelines.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017
CAMFAS: A Compiler Approach to Mitigate Fault Attacks via Enhanced SIMDization.
IACR Cryptol. ePrint Arch., 2017

Using Hardware Counters to Predict Vectorization.
Proceedings of the Languages and Compilers for Parallel Computing, 2017

LORE: A loop repository for the evaluation of compilers.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

2016
Is computer science dying?
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Data-rate-aware FPGA-based acceleration framework for streaming applications.
Proceedings of the International Conference on ReConFigurable Computing and FPGAs, 2016

Polygonal Iteration Space Partitioning.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

SIMD-based soft error detection.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

2015
ViPZonE: Hardware Power Variability-Aware Virtual Memory Management for Energy Savings.
IEEE Trans. Computers, 2015

DPCS: Dynamic Power/Capacity Scaling for SRAM Caches in the Nanoscale Era.
ACM Trans. Archit. Code Optim., 2015

Fault Tolerant Scheduling for Parallel Loops on Shared Memory Systems.
J. Inf. Sci. Eng., 2015

NSF expedition on variability-aware software: Recent results and contributions.
it Inf. Technol., 2015

Software fault tolerance for FPUs via vectorization.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

WebRTCbench: a benchmark for performance assessment of webRTC implementations.
Proceedings of the 13th IEEE Symposium on Embedded Systems For Real-time Multimedia, 2015

Cyberphysical-system-on-chip (CPSoC): a self-aware MPSoC paradigm with cross-layer virtual sensing and actuation.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

SmartBalance: a sensing-driven linux load balancer for energy efficiency of heterogeneous MPSoCs.
Proceedings of the 52nd Annual Design Automation Conference, 2015

2014
Acknowledgment to Reviewers.
Int. J. Parallel Program., 2014

A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms.
Proceedings of the Network and Parallel Computing, 2014

Author retrospective for a <i>global</i> resource-constrained parallelization technique.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Power / Capacity Scaling: Energy Savings With Simple Fault-Tolerant Caches.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Multi-Layer Memory Resiliency.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

On-chip self-awareness using Cyberphysical-Systems-on-Chip (CPSoC).
Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, 2014

2013
Underdesigned and Opportunistic Computing in Presence of Hardware Variability.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2013

Effective Evaluation of Multi-core Based Systems.
Proceedings of the IEEE 12th International Symposium on Parallel and Distributed Computing, 2013

Improving numerical accuracy for non-negative matrix multiplication on GPUs using recursive algorithms.
Proceedings of the International Conference on Supercomputing, 2013

On the Determination of Inlining Vectors for Program Optimization.
Proceedings of the Compiler Construction - 22nd International Conference, 2013

Variability-aware memory management for nanoscale computing.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

Optimizing Program Performance via Similarity, Using a Feature-Agnostic Approach.
Proceedings of the Advanced Parallel Processing Technologies, 2013

2012
Just in Time Load Balancing.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

A fault tolerant self-scheduling scheme for parallel loops on shared memory systems.
Proceedings of the 19th International Conference on High Performance Computing, 2012

AVid: Annotation driven video decoding for hybrid memories.
Proceedings of the IEEE 10th Symposium on Embedded Systems for Real-time Multimedia, 2012

VaMV: Variability-aware Memory Virtualization.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

ViPZonE: OS-level memory variability-driven physical address zoning for energy savings.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

Selective search of inlining vectors for program optimization.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011
Modulo Scheduling and Loop Pipelining.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: Matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation.
ACM Trans. Math. Softw., 2011

Improving accuracy for matrix multiplications on GPUs.
Sci. Program., 2011

Improving the Accuracy of High Performance BLAS Implementations Using Adaptive Blocked Algorithms.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

Pruning hardware evaluation space via correlation-driven application similarity analysis.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010
On the efficacy of call graph-level thread-level speculation.
Proceedings of the first joint WOSP/SIPEW International Conference on Performance Engineering, 2010

How Many Threads to Spawn during Program Multithreading?
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Pretty Good Accuracy in Matrix Multiplication with GPUs.
Proceedings of the Ninth International Symposium on Parallel and Distributed Computing, 2010

Exploitation of nested thread-level speculative parallelism on multi-core systems.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Adaptive Winograd's matrix multiplications.
ACM Trans. Math. Softw., 2009

On the exploitation of loop-level parallelism in embedded applications.
ACM Trans. Embed. Comput. Syst., 2009

A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors.
Neural Networks, 2009

Brain Derived Vision Algorithm on High Performance Architectures.
Int. J. Parallel Program., 2009

Optimizing control flow in loops using interval and dependence analysis.
Des. Autom. Embed. Syst., 2009

Cache-aware partitioning of multi-dimensional iteration spaces.
Proceedings of of SYSTOR 2009: The Israeli Experimental Systems Conference 2009, 2009

Performance Characterization of Itanium® 2-Based Montecito Processor.
Proceedings of the Computer Performance Evaluation and Benchmarking, 2009

Techniques for efficient placement of synchronization primitives.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Efficient simulation of large-scale Spiking Neural Networks using CUDA graphics processors.
Proceedings of the International Joint Conference on Neural Networks, 2009

Synchronization optimizations for efficient execution on multi-cores.
Proceedings of the 23rd international conference on Supercomputing, 2009

Efficient Scheduling of Nested Parallel Loops on Multi-Core Systems.
Proceedings of the ICPP 2009, 2009

2008
Register File Power Reduction Using Bypass Sensitive Compiler.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2008

Comparative architectural characterization of SPEC CPU2000 and CPU2006 benchmarks on the intel® Core<sup>TM</sup> 2 Duo processor.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

Cache-aware iteration space partitioning.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Impact of JVM superoperators on energy consumption in resource-constrained embedded systems.
Proceedings of the 2008 ACM SIGPLAN/SIGBED Conference on Languages, 2008

Control flow optimization in loops using interval analysis.
Proceedings of the 2008 International Conference on Compilers, 2008

2007
A predictive decode filter cache for reducing power consumption in embedded processors.
ACM Trans. Design Autom. Electr. Syst., 2007

Automatic Design Space Exploration of Register Bypasses in Embedded Processors.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

DYNAMO: A Cross-Layer Framework for End-to-End QoS and Energy Optimization in Mobile Handheld Devices.
IEEE J. Sel. Areas Commun., 2007

R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks.
Algorithmica, 2007

Comparative characterization of SPEC CPU2000 and CPU2006 on Itanium architecture.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Tight analysis of the performance potential of thread speculation using spec CPU 2006.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Novel Brain-Derived Algorithms Scale Linearly with Number of Processing Elements.
Proceedings of the Parallel Computing: Architectures, 2007

Annotation Integration and Trade-off Analysis for Multimedia Applications.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Adaptive Strassen's matrix multiplication.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

A simplified java bytecode compilation system for resource-constrained embedded processors.
Proceedings of the 2007 International Conference on Compilers, 2007

Short-Circuit Compiler Transformation: Optimizing Conditional Blocks.
Proceedings of the 12th Conference on Asia South Pacific Design Automation, 2007

2006
Retargetable pipeline hazard detection for partially bypassed processors.
IEEE Trans. Very Large Scale Integr. Syst., 2006

Energy efficient watermarking on mobile devices using proxy-based partitioning.
IEEE Trans. Very Large Scale Integr. Syst., 2006

Expression equivalence checking using interval analysis.
IEEE Trans. Very Large Scale Integr. Syst., 2006

Compilation framework for code size reduction using reduced bit-width ISAs (rISAs).
ACM Trans. Design Autom. Electr. Syst., 2006

PBPAIR: an energy-efficient error-resilient encoding using probability based power aware intra refresh.
ACM SIGMOBILE Mob. Comput. Commun. Rev., 2006

A general approach for partitioning N-dimensional parallel nested loops with conditionals.
Proceedings of the SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30, 2006

Bypass aware instruction scheduling for register file power reduction.
Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, 2006

Video Stream Annotations for Energy Trade-offs in Multimedia Applications.
Proceedings of the 5th International Symposium on Parallel and Distributed Computing (ISPDC 2006), 2006

On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Lightweight lock-free synchronization methods for multithreading.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

History-aware Self-Scheduling.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Probablistic Self-Scheduling.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Annotation Based Multimedia Streaming Over Wireless Networks.
Proceedings of the 2006 4th Workshop on Embedded Systems for Real-Time Multimedia, 2006

Automatic generation of operation tables for fast exploration of bypasses in embedded processors.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Software annotations for power optimization on mobile devices.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Challenges in exploitation of loop parallelism in embedded applications.
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

2005
Line Size Adaptivity Analysis of Parameterized Loop Nests for Direct Mapped Data Cache.
IEEE Trans. Computers, 2005

A novel approach for partitioning iteration spaces with variable densities.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

An Efficient Approach for Self-scheduling Parallel Loops on Multiprogrammed Parallel Computers.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

An Efficient Load Balancing Scheme for Grid-based High Performance Scientific Computing.
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

Using a Way Cache to Improve Performance of Set-Associative Caches.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Enhanced Loop Coalescing: A Compiler Technique for Transforming Non-uniform Iteration Spaces.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Using Recursion to Boost ATLAS's Performance.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

A Cross-Layer Approach for Power-Performance Optimization in Distributed Mobile Systems.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Probability Based Power Aware Error Resilient Coding.
Proceedings of the 25th International Conference on Distributed Computing Systems Workshops (ICDCS 2005 Workshops), 2005

Energy Analysis of Multimedia Watermarking on Mobile Handheld Devices.
Proceedings of the 2005 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005

High performance annotation-aware JVM for Java cards.
Proceedings of the EMSOFT 2005, 2005

PBExplore: A Framework for Compiler-in-the-Loop Exploration of Partial Bypassing in Embedded Processors.
Proceedings of the 2005 Design, 2005

Aggregating processor free time for energy reduction.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

Equivalence checking of arithmetic expressions using fast evaluation.
Proceedings of the 2005 International Conference on Compilers, 2005

2004
Coordinated parallelizing compiler optimizations and high-level synthesis.
ACM Trans. Design Autom. Electr. Syst., 2004

Using global code motions to improve the quality of results for high-level synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Caching Values in the Load Store Queue.
Proceedings of the 12th International Workshop on Modeling, 2004

A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

JuliusC: A Practical Approach for the Analysis of Divide-and-Conquer Algorithms.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Interconnect-Aware Mapping of Applications to Coarse-Grain Reconfigurable Architectures.
Proceedings of the Field Programmable Logic and Application, 2004

Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow.
Proceedings of the 2004 Design, 2004

Network Topology Exploration of Mesh-Based Coarse-Grain Reconfigurable Architectures.
Proceedings of the 2004 Design, 2004

Proxy-based task partitioning of watermarking algorithms for reducing energy consumption in mobile devices.
Proceedings of the 41th Design Automation Conference, 2004

Operation tables for scheduling in the presence of incomplete bypassing.
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

2003
RTGEN-an algorithm for automatic generation of reservation tables from architectural descriptions.
IEEE Trans. Very Large Scale Integr. Syst., 2003

Access pattern-based memory and connectivity architecture exploration.
ACM Trans. Embed. Comput. Syst., 2003

SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations.
Proceedings of the 16th International Conference on VLSI Design (VLSI Design 2003), 2003

Integrated power management for video streaming to mobile handheld devices.
Proceedings of the Eleventh ACM International Conference on Multimedia, 2003

A Data Cache with Dynamic Mapping.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Reducing data cache energy consumption via cached load/store queue.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

FORGE: A Framework for Optimization of Distributed Embedded Systems Software.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Interface Synthesis using Memory Mapping for an FPGA Platform.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors.
Proceedings of the 2003 Design, 2003

Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs.
Proceedings of the 2003 Design, 2003

Low Energy Associative Data Caches for Embedded Systems.
Proceedings of the Embedded Software for SoC, 2003

Memory architecture exploration for programmable embedded systems.
Kluwer, ISBN: 978-1-4020-7324-3, 2003

2002
Automatic Modeling and Validation of Pipeline Specifications Driven by an Architecture Description Language.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

A Design Space Exploration Framework for Reduced Bit-Width Instruction Set Architecture (rISA) Design .
Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002

Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Synthesis.
Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002

Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Power Savings in Embedded Processors through Decode Filer Cache.
Proceedings of the 2002 Design, 2002

Automatic Verification of In-Order Execution In Microprocessors with Fragmented Pipelines and Multicycle Functional Units.
Proceedings of the 2002 Design, 2002

An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAs.
Proceedings of the 2002 Design, 2002

Memory System Connectivity Exploration.
Proceedings of the 2002 Design, 2002

Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints.
Proceedings of the 2002 Design, 2002

Coordinated transformations for high-level synthesis of high performance microprocessor blocks.
Proceedings of the 39th Design Automation Conference, 2002

2001
V-SAT: A visual specification and analysis tool for system-on-chip exploration.
J. Syst. Archit., 2001

Data Memory Organization and Optimizations in Application-Specific Systems.
IEEE Des. Test Comput., 2001

Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance.
Proceedings of the Algorithm Engineering, 2001

Processor-Memory Co-Exploration driven by a Memory-Aware Architecture Description Language.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Functional abstraction driven design space exploration of heterogeneous programmable architectures.
Proceedings of the 14th International Symposium on Systems Synthesis, 2001

Conditional speculation and its effects on performance and area for high-level snthesis.
Proceedings of the 14th International Symposium on Systems Synthesis, 2001

APEX: Access Pattern Based Memory Architecture Exploration.
Proceedings of the 14th International Symposium on Systems Synthesis, 2001

Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Automatic validation of pipeline specifications.
Proceedings of the Sixth IEEE International High-Level Design Validation and Test Workshop 2001, 2001

Access pattern based local memory customization for low power embedded systems.
Proceedings of the Conference on Design, Automation and Test in Europe, 2001

Speculation Techniques for High Level Synthesis of Control Intensive Designs.
Proceedings of the 38th Design Automation Conference, 2001

New directions in compiler technology for embedded systems (embedded tutorial).
Proceedings of ASP-DAC 2001, 2001

2000
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems.
ACM Trans. Design Autom. Electr. Syst., 2000

The Design of the PROMIS Compiler-Towards Multi-Level Parallelization.
Int. J. Parallel Program., 2000

An annotation-aware Java virtual machine implementation.
Concurr. Pract. Exp., 2000

Compiler-Directed Cache Assist Adaptivity.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Compiler-Directed Cache Line Size Adaptivity.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Aggressive Memory-Aware Compilation.
Proceedings of the Intelligent Memory Systems, Second International Workshop, 2000

Customizing Software Toolkits for Embedded Systems-On-Chip.
Proceedings of the Architecture and Design of Distributed Embedded Systems, 2000

Using profiling to reduce branch misprediction costs on a dynamically scheduled processor.
Proceedings of the 14th international conference on Supercomputing, 2000

MIST: An Algorithm for Memory Miss Traffic Management.
Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, 2000

Architecture Exploration of Parameterizable EPIC SOC Architectures.
Proceedings of the 2000 Design, 2000

Memory aware compilation through accurate timing extraction.
Proceedings of the 37th Conference on Design Automation, 2000

1999
Local memory exploration and optimization in embedded systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1999

Augmenting Loop Tiling with Data Alignment for Improved Cache Performance.
IEEE Trans. Computers, 1999

Symbolic Analysis in the PROMIS Compiler.
Proceedings of the Languages and Compilers for Parallel Computing, 1999

Java Annotation-Aware Just-in-Time (AJIT) Complilation System.
Proceedings of the ACM 1999 Conference on Java Grande, JAVA '99, San Francisco, CA, USA, 1999

Adapting cache line size to application behavior.
Proceedings of the 13th international conference on Supercomputing, 1999

EXPRESSION: A Language for Architecture Exploration through Compiler/Simulator Retargetability.
Proceedings of the 1999 Design, 1999

The Design of the PROMIS Compiler.
Proceedings of the Compiler Construction, 8th International Conference, 1999

1998
Incorporating DRAM access modes into high-level synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1998

Editor's Announcement.
Int. J. Parallel Program., 1998

Copy Elimination for Parallelizing Compilers.
Proceedings of the Languages and Compilers for Parallel Computing, 1998

Analyzing the Individual/Combined Effects of Speculative and Guarded Execution on a Superscalar Architecture.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Data Cache Sizing for Embedded Processor Applications.
Proceedings of the 1998 Design, 1998

1997
Memory data organization for improved cache performance in embedded processor applications.
ACM Trans. Design Autom. Electr. Syst., 1997

Annotating the Java Bytecodes in Support of Optimization.
Concurr. Pract. Exp., 1997

Resource Directed Loop Pipelining: Exposing Just Enough Parallelism.
Comput. J., 1997

A Systematic Approach to Branch Speculation.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

Architectural Exploration and Optimization of Local Memory in Embedded Systems.
Proceedings of the 10th International Symposium on System Synthesis, 1997

Achieving Multi-level Parallelization.
Proceedings of the High Performance Computing, International Symposium, 1997

Improving cache Performance Through Tiling and Data Alignment.
Proceedings of the Solving Irregularly Structured Problems in Parallel, 1997

A Data Alignment Technique for Improving Cache Performance.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

Exploiting off-chip memory access modes in high-level synthesis.
Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design, 1997

Efficient utilization of scratch-pad memory in embedded processor applications.
Proceedings of the European Design and Test Conference, 1997

The PROMIS Compiler Prototype.
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996
Computing Programs Containing Band Linear Recurrences on Vector Supercomputers.
IEEE Trans. Parallel Distributed Syst., 1996

Optimal register assignment to loops for embedded code generation.
ACM Trans. Design Autom. Electr. Syst., 1996

Elimination of redundant memory traffic in high-level synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1996

The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints.
IEEE Trans. Computers, 1996

Resource-<i>Directed</i> Loop Pipelining.
Proceedings of the Languages and Compilers for Parallel Computing, 1996

Memory Organization for Improved Data Cache Performance in Embedded Processors.
Proceedings of the 9th International Symposium on System Synthesis, 1996

A Method for Register Allocation to Loops in Multiple Register File Architectures.
Proceedings of IPPS '96, 1996

An efficient, global resource-directed approach to exploiting instruction-level parallelism.
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995
Performance evaluation for application-specific architectures.
IEEE Trans. Very Large Scale Integr. Syst., 1995

Resource-Constrained Software Pipelining.
IEEE Trans. Parallel Distributed Syst., 1995

A hierarchical approach to instruction-level parallelization.
Int. J. Parallel Program., 1995

A hypergraph-based model for port allocation on multiple-register-file VLIW architectures.
Int. J. Parallel Program., 1995

Path Collection and Dependence Testing in the Presence of Dynamic, Pointer-Based Data Structures.
Proceedings of the Languages, 1995

A Simple Mechanism for Improving the Accuracy and Efficiency of Instruction-Level Disambiguation.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Incorporating compiler feedback into the design of ASIPs.
Proceedings of the 1995 European Design and Test Conference, 1995

1994
From the guest editors.
Int. J. Parallel Program., 1994

Editors' introduction.
Int. J. Parallel Program., 1994

Ultra Fine-Grain Template-Driven Synthesis.
Proceedings of the Seventh International Conference on VLSI Design, 1994

A General Data Dependence Test for Dynamic, Pointer-Based Data Structures.
Proceedings of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation (PLDI), 1994

Mutation Scheduling: A Unified Approach to Compiling for Fine-Grain Parallelism.
Proceedings of the Languages and Compilers for Parallel Computing, 1994

Scalable Techniques for Computing Band Linear Recurrences on Massively Parallel and Vector Supercomputers.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

A Language for Conveying the Aliasing Properties of Dynamic, Pointer-Based Data Structures.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

An Approach to Combine Predicated/Speculative Execution for Programs with Unpredictable Branches.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

Partitioning of Variables for Multiple-Register-File Architectures via Hypergraph Coloring.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

A Framework for Data Dependence Testing in the Presence of Pointers.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

Partitioning of Variables for Multiple-Register-File VLIW Architectures.
Proceedings of the 1994 International Conference on Parallel Processing, 1994

Integrating program transformations in the memory-based synthesis of image and video algorithms.
Proceedings of the 1994 IEEE/ACM International Conference on Computer-Aided Design, 1994

A performance evaluator for parameterized ASIC architectures.
Proceedings of the Proceedings EURO-DAC'94, 1994

A Unified code generation approach using mutation scheduling.
Proceedings of the Code Generation for Embedded Processors [Dagstuhl Workshop, Dagstuhl, Germany, August 31, 1994

Minimization of Memory Traffic in High-Level Synthesis.
Proceedings of the 31st Conference on Design Automation, 1994

1993
Automatic program parallelization.
Proc. IEEE, 1993

A Mapping Strategy for MIMD Computers.
Int. J. High Speed Comput., 1993

Massive Parallelism and Fine-Grain Parallelism: are They Incompatible?
Int. J. High Speed Comput., 1993

Harmonic Scheduling: A Technique for Scheduling Beyond Loop-Carried Dependencies.
Proceedings of the Sixth International Conference on VLSI Design, 1993

VISTA: The Visual Interface for Scheduling Transformations and Analysis.
Proceedings of the Languages and Compilers for Parallel Computing, 1993

Trailblazing: A Hierarchical Approach to Percolation Scheduling.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

Regular schedules for scalable design of IIR filters.
Proceedings of the European Design Automation Conference 1993, 1993

High-Level Synthesis of Scalable Architectures for IIR Filters using Multichip Modules.
Proceedings of the 30th Design Automation Conference. Dallas, 1993

1992
Abstract Description of Pointer Data Structures: An Approach for Improving the Analysis and Optimization of Imperative Programs.
LOPLAS, 1992

Abstractions for Recursive Pointer Data Structures: Improving the Analysis of Imperative Programs.
Proceedings of the ACM SIGPLAN'92 Conference on Programming Language Design and Implementation (PLDI), 1992

Partitioned register files for VLIWs: a preliminary analysis of tradeoffs.
Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992

A Hierarchical Parallelizing Compiler for VLIW/MIMD Machines.
Proceedings of the Languages and Compilers for Parallel Computing, 1992

Speedup of band linear recurrences in the presence of resource constraints.
Proceedings of the 6th international conference on Supercomputing, 1992

An Efficient Global Resource Constrained Technique for Exploiting Instruction Level Parallelism.
Proceedings of the 1992 International Conference on Parallel Processing, 1992

Applying an Abstract Data Structure Description Approach to Parallelizing Scientific Pointer Programs.
Proceedings of the 1992 International Conference on Parallel Processing, 1992

Harmonic scheduling of linear recurrences for digital filter design.
Proceedings of the conference on European design automation, 1992

1991
Optimal Schedules for Parallel Prefix Computation with Bounded Resources.
Proceedings of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1991

A New Technique for Induction Variable Removal.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

Register Allocation, Renaming and Their Impact on Fine-Grain Parallelism.
Proceedings of the Languages and Compilers for Parallel Computing, 1991

Parallelizing Tightly Nested Loops.
Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

How Do We Make Parallel Processing a Reality? Bridging the Gap Between Theory and Practice.
Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

A Percolation Based VLIW Architecture.
Proceedings of the International Conference on Parallel Processing, 1991

Incremental Tree Height Reduction for High Level Synthesis.
Proceedings of the 28th Design Automation Conference, 1991

1990
Parallelizing Programs with Recursive Data Structures.
IEEE Trans. Parallel Distributed Syst., 1990

Static Scheduling for Dynamic Dataflow Machines.
J. Parallel Distributed Comput., 1990

Realistic scheduling: compaction for pipelined architectures.
Proceedings of the 23rd Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1990

Parallelizing Non-Vectorizable Loops for MIMD Machines.
Proceedings of the 1990 International Conference on Parallel Processing, 1990

Percolation Based Synthesis.
Proceedings of the 27th ACM/IEEE Design Automation Conference. Orlando, 1990

1989
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies.
IEEE Trans. Computers, 1989

Adaptive Bitonic Sorting: An Optimal Parallel Algorithm for Shared-Memory Machines.
SIAM J. Comput., 1989

Intererence analysis tools for parallelizing programs with recursive data structures.
Proceedings of the 3rd international conference on Supercomputing, 1989

A global resource-constrained parallelization technique.
Proceedings of the 3rd international conference on Supercomputing, 1989

1988
A Development Environment for Horizontal Microcode.
IEEE Trans. Software Eng., 1988

Fine-grain compilation for pipelined machines.
J. Supercomput., 1988

Loop Quantization: A Generalized Loop Unwinding Technique.
J. Parallel Distributed Comput., 1988

Optimal Loop Parallelization.
Proceedings of the ACM SIGPLAN'88 Conference on Programming Language Design and Implementation (PLDI), 1988

Perfect Pipelining: A New Loop Parallelization Technique.
Proceedings of the ESOP '88, 1988

1987
Loop Quantization or Unwinding Done Right.
Proceedings of the Supercomputing, 1987

1986
A development environment for horizontal microcode programs.
Proceedings of the 19th annual workshop on Microprogramming, 1986

Getting High Performance with Slow Memory.
Proceedings of the Spring COMPCON'86, 1986

1985
Efficient hardware for multiway jumps and pre-fetches.
Proceedings of the 18th annual workshop on Microprogramming, 1985

Uniform Parallelism Exploitation in Ordinary Programs.
Proceedings of the International Conference on Parallel Processing, 1985

1984
Measuring the Parallelism Available for Very Long Instruction Word Architectures.
IEEE Trans. Computers, 1984

Parallel processing: a smart compiler and a dumb machine.
Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction, 1984

Parallel processing: a smart compiler and a dumb machine (with retrospective)
Proceedings of the 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999, 1984

1983
Comparison of Compacting Algorithms for Garbage Collection.
ACM Trans. Program. Lang. Syst., 1983

1981
Using an oracle to measure potential parallelism in single instruction stream programs.
Proceedings of the 14th annual workshop on Microprogramming, 1981


  Loading...