Henk Corporaal

Orcid: 0000-0003-4506-5732

Affiliations:
  • Eindhoven University of Technology, Netherlands


According to our database1, Henk Corporaal authored at least 379 papers between 1989 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRA.
ACM Trans. Reconfigurable Technol. Syst., June, 2024

Probabilistic Inference in the Era of Tensor Networks and Differential Programming.
CoRR, 2024

How Much Can We Gain From Tensor Kernel Fusion on GPUs?
IEEE Access, 2024

Invited: Achieving PetaOps/W Edge-AI Processing.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023
Delay Prediction for ASIC HLS: Comparing Graph-Based and Nongraph-Based Learning Models.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2023

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors.
IEEE Trans. Parallel Distributed Syst., 2023

MTTR reduction of FPGA scrubbing: Exploring SEU sensitivity.
Microprocess. Microsystems, 2023

SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation.
Proceedings of the 37th International Conference on Supercomputing, 2023

BrainTTA: A 28.6 TOPS/W Compiler Programmable Transport-Triggered NN SoC.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

QMTS: Fixed-point Quantization for Multiple-timescale Spiking Neural Networks.
Proceedings of the Artificial Neural Networks and Machine Learning, 2023

Dependability of Future Edge-AI Processors: Pandora's Box.
Proceedings of the IEEE European Test Symposium, 2023

BOMP- NAS: Bayesian Optimization Mixed Precision NAS.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023


ReMeCo: Reliable Memristor-Based in-Memory Neuromorphic Computation.
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

2022
Accelerating Weather Prediction Using Near-Memory Reconfigurable Fabric.
ACM Trans. Reconfigurable Technol. Syst., 2022

How Flexible is Your Computing System?
ACM Trans. Embed. Comput. Syst., 2022

Blocks: Challenging SIMDs and VLIWs With a Reconfigurable Architecture.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

OCC: An Automated End-to-End Machine Learning Optimizing Compiler for Computing-In-Memory.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

SCWC: Structured channel weight sharing to compress convolutional neural networks.
Inf. Sci., 2022

THOR - A Neuromorphic Processor with 7.29G TSOP$^2$/mm$^2$Js Energy-Throughput Efficiency.
CoRR, 2022

CONVOLVE: Smart and seamless design of smart edge processors.
CoRR, 2022

BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC.
CoRR, 2022

A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers.
CoRR, 2022

LEAPER: Modeling Cloud FPGA-based Systems via Transfer Learning.
CoRR, 2022

Low- and Mixed-Precision Inference Accelerators.
CoRR, 2022

How to train accurate BNNs for embedded systems?
CoRR, 2022

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors.
CoRR, 2022

Reduced-Precision Acceleration of Radio-Astronomical Imaging on Reconfigurable Hardware.
IEEE Access, 2022

MoESR: Blind Super-Resolution using Kernel-Aware Mixture of Experts.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

An Efficient FPGA Implementation for Real-Time and Low-Power UAV Object Detection.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

Sibyl: adaptive and extensible data placement in hybrid storage systems using online reinforcement learning.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Accelerating Video Object Detection by Exploiting Prior Object Locations.
Proceedings of the Image Analysis and Processing - ICIAP 2022, 2022

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Partial Evaluation in Junction Trees.
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

Prebypass: Software Register File Bypassing for Reduced Interconnection Architectures.
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

CELR: Cloud Enhanced Local Reconstruction from low-dose sparse Scanning Electron Microscopy images.
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

Quantization: how far should we go?
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

DNAsim: Evaluation Framework for Digital Neuromorphic Architectures.
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

SACA: System-level Analog CIM Accelerators Simulation Framework: Architecture and Cycle-accurate System-to-device Simulator.
Proceedings of the 37th Conference on Design of Circuits and Integrated Systems, 2022

SACA: System-level Analog CIM Accelerators Simulation Framework: Accurate Simulation of Non-Ideal Components.
Proceedings of the 37th Conference on Design of Circuits and Integrated Systems, 2022

SySCIM: SystemC-AMS Simulation of Memristive Computation In-Memory.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2021
Converter-Free Power Delivery Using Voltage Stacking for Near/Subthreshold Operation.
IEEE Trans. Very Large Scale Integr. Syst., 2021

CGRA-EAM - Rapid Energy and Area Estimation for Coarse-grained Reconfigurable Architectures.
ACM Trans. Reconfigurable Technol. Syst., 2021

Taming the State-space Explosion in the Makespan Optimization of Flexible Manufacturing Systems.
ACM Trans. Cyber Phys. Syst., 2021

Multi-Level Optimization of an Ultra-Low Power BrainWave System for Non-Convulsive Seizure Detection.
IEEE Trans. Biomed. Circuits Syst., 2021

FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications.
IEEE Micro, 2021

NERO: Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric.
CoRR, 2021

ConvFusion: A Model for Layer Fusion in Convolutional Neural Networks.
IEEE Access, 2021

DualSR: Zero-Shot Dual Learning for Real-World Super-Resolution.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

NeuroVP: A System-Level Virtual Platform for Integration of Neuromorphic Accelerators.
Proceedings of the 34th IEEE International System-on-Chip Conference, 2021

LoopOpt: Declarative Transformations Made Easy.
Proceedings of the SCOPES '21: 24th International Workshop on Software and Compilers for Embedded Systems, Eindhoven, The Netherlands, November 1, 2021

DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SE1: What Technologies Will Shape the Future of Computing?
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

Characterization of Mems Microphone Sensitivity and Phase Distributions with Applications in Array Processing.
Proceedings of the IEEE International Conference on Acoustics, 2021

Modeling FPGA-Based Systems via Few-Shot Learning.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

NMPO: Near-Memory Computing Profiling and Offloading.
Proceedings of the 24th Euromicro Conference on Digital System Design, 2021

Efficient Tensor Cores support in TVM for Low-Latency Deep learning.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Hardware- and Situation-Aware Sensing for Robust Closed-Loop Control Systems.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Progressive Raising in Multi-level IR.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

Hardware Approximation of Exponential Decay for Spiking Neural Networks.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020
Skeleton-Based Synthesis Flow for Computation-in-Memory Architectures.
IEEE Trans. Emerg. Top. Comput., 2020

Schedule Synthesis for Halide Pipelines on GPUs.
ACM Trans. Archit. Code Optim., 2020

Declarative Loop Tactics for Domain-specific Optimization.
ACM Trans. Archit. Code Optim., 2020

Quantization of deep neural networks for accumulator-constrained processors.
Microprocess. Microsystems, 2020

Approximation-Aware Design of an Image-Based Control System.
IEEE Access, 2020

Real-time audio processing for hearing aids using a model-based bayesian inference framework.
Proceedings of the SCOPES '20: 23rd International Workshop on Software and Compilers for Embedded Systems, 2020

Reviewing inference performance of state-of-the-art deep learning frameworks.
Proceedings of the SCOPES '20: 23rd International Workshop on Software and Compilers for Embedded Systems, 2020

Programming tensor cores from an image processing DSL.
Proceedings of the SCOPES '20: 23rd International Workshop on Software and Compilers for Embedded Systems, 2020

System Simulation of Memristor Based Computation in Memory Platforms.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2020

Near Memory Acceleration on High Resolution Radio Astronomy Imaging.
Proceedings of the 9th Mediterranean Conference on Embedded Computing, 2020

BrainWave: an energy-efficient EEG monitoring system - evaluation and trade-offs.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

OPTCOMNET: Optimized Neural Networks for Low-Complexity Channel Estimation.
Proceedings of the 2020 IEEE International Conference on Communications, 2020

Approximate Inference by Kullback-Leibler Tensor Belief Propagation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

PET-to-MLIR: A polyhedral front-end for MLIR.
Proceedings of the 23rd Euromicro Conference on Digital System Design, 2020

TDO-CIM: Transparent Detection and Offloading for Computation In-memory.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Approximation Trade Offs in an Image-Based Control System.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Automatic Generation of Multi-Objective Polyhedral Compiler Transformations.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Schedule Synthesis for Halide Pipelines through Reuse Analysis.
ACM Trans. Archit. Code Optim., 2019

Near-memory computing: Past, present, and future.
Microprocess. Microsystems, 2019

LocalNorm: Robust Image Classification through Dynamically Regularized Normalization.
CoRR, 2019

Towards Efficient Code Generation for Exposed Datapath Architectures.
Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems, 2019

Memory and Parallelism Analysis Using a Platform-Independent Approach.
Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems, 2019

CIM-SIM: Computation In Memory SIMuIator.
Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems, 2019

Automatic Memory-Efficient Scheduling of CNNs.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2019

Low Precision Processing for High Order Stencil Computations.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2019

IMACS: A Framework for Performance Evaluation of Image Approximation in a Closed-loop System.
Proceedings of the 8th Mediterranean Conference on Embedded Computing, 2019

Bitwise Neural Network Acceleration: Opportunities and Challenges.
Proceedings of the 8th Mediterranean Conference on Embedded Computing, 2019

An Automated Approximation Methodology for Arithmetic Circuits.
Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

Robust Bayesian Beamforming for Sources at Different Distances with Applications in Urban Monitoring.
Proceedings of the IEEE International Conference on Acoustics, 2019

Blocks: Redesigning Coarse Grained Reconfigurable Architectures for Energy Efficiency.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

NARMADA: Near-Memory Horizontal Diffusion Accelerator for Scalable Stencil Computations.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Fault Tolerant FPGAs: Where to Spend the Effort?
Proceedings of the 22nd Euromicro Conference on Digital System Design, 2019

Platform Independent Software Analysis for Near Memory Computing.
Proceedings of the 22nd Euromicro Conference on Digital System Design, 2019

Scatter Scrubbing: A Method to Reduce SEU Repair Time in FPGA Configuration Memory.
Proceedings of the 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2019

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018
Analytic Multi-Core Processor Model for Fast Design-Space Exploration.
IEEE Trans. Computers, 2018

Exploiting Specification Modularity to Prune the Optimization-Space of Manufacturing Systems.
Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems, 2018

AivoTTA: an energy efficient programmable accelerator for CNN-based object recognition.
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

Cross-Domain Modeling and Optimization of High-Speed Visual Servo Systems.
Proceedings of the 15th International Conference on Control, 2018

Datawidth-Aware Energy-Efficient Multipliers: A Case for Going Sign Magnitude.
Proceedings of the 21st Euromicro Conference on Digital System Design, 2018

A Review of Near-Memory Computing Architectures: Opportunities and Challenges.
Proceedings of the 21st Euromicro Conference on Digital System Design, 2018

A Generic Methodology to Compute Design Sensitivity to SEU in SRAM-Based FPGA.
Proceedings of the 21st Euromicro Conference on Digital System Design, 2018

Designing Energy Efficient Approximate Multipliers for Neural Acceleration.
Proceedings of the 21st Euromicro Conference on Digital System Design, 2018

Quantization of Constrained Processor Data Paths Applied to Convolutional Neural Networks.
Proceedings of the 21st Euromicro Conference on Digital System Design, 2018

Loop transformations leveraging hardware prefetching.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017
Extending Halide to Improve Software Development for Imaging DSPs.
ACM Trans. Archit. Code Optim., 2017

Automatic instruction-set architecture synthesis for VLIW processor cores in the ASAM project.
Microprocess. Microsystems, 2017

Identifying bottlenecks in manufacturing systems using stochastic criticality analysis.
Proceedings of the 2017 Forum on Specification and Design Languages, 2017

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures.
Proceedings of the Euromicro Conference on Digital System Design, 2017

MeSAP: A fast analytic power model for DRAM memories.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Memristor for computing: Myth or reality?
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Locality-Aware CTA Clustering for Modern GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
End-to-End Latency Analysis of Dataflow Scenarios Mapped Onto Shared Heterogeneous Resources.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs.
IEEE Trans. Computers, 2016

R-GPU: A Reconfigurable GPU Architecture.
ACM Trans. Archit. Code Optim., 2016

xCPS: a tool to explore cyber physical systems.
SIGBED Rev., 2016

Feasibility of Contactless Pulse Rate Monitoring of Neonates using Google Glass.
EAI Endorsed Trans. Future Intell. Educ. Environ., 2016

CSDFa: A Model for Exploiting the Trade-Off between Data and Pipeline Parallelism.
Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems, 2016

Coarse grained reconfigurable architectures in the past 25 years: Overview and classification.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

A configurable SIMD architecture with explicit datapath for intelligent learning.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Skeleton-based design and simulation flow for Computation-in-Memory architectures.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2016

X: A Comprehensive Analytic Model for Parallel Machines.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

SFU-Driven Transparent Approximation Acceleration on GPUs.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Robust online face tracking-by-detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

A Fast Estimator of Performance with Respect to the Design Parameters of Self Re-Entrant Flowshops.
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016

Multi-granular Arithmetic in a Coarse-Grain Reconfigurable Architecture.
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016

MacSim: A MAC-Enabled High-Performance Low-Power SIMD Architecture.
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016

Code Generation for Reconfigurable Explicit Datapath Architectures with LLVM.
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016

The neuro vector engine: Flexibility to improve convolutional net efficiency for wearable vision.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Critical points based register-concurrency autotuning for GPUs.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

2015
A Low-Energy Wide SIMD Architecture with Explicit Datapath.
J. Signal Process. Syst., 2015

A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor.
J. Signal Process. Syst., 2015

Correlation ratio based volume image registration on GPUs.
Microprocess. Microsystems, 2015

Collaborative detection of repetitive behavior by multiple uncalibrated cameras.
Inf. Fusion, 2015

Demystifying the 16 × 16 thread-block for stencils on the GPU.
Concurr. Comput. Pract. Exp., 2015

VLIW Code Generation for a Convolutional Network Accelerator.
Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems, 2015

High-level software-pipelining in LLVM.
Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems, 2015

Adaptive and transparent cache bypassing for GPUs.
Proceedings of the International Conference for High Performance Computing, 2015

Modeling resource sharing using FSM-SADF.
Proceedings of the 13. ACM/IEEE International Conference on Formal Methods and Models for Codesign, 2015

Fine-Grained Synchronizations and Dataflow Programming on GPUs.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Analytic processor model for fast design-space exploration.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Transit: A Visual Analytical Model for Multithreaded Machines.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

SPINE: From C loop-nests to highly efficient accelerators using Algorithmic Species.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

An automated technique to generate relocatable partial bitstreams for Xilinx FPGAs.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015


A Locality Aware Convolutional Neural Networks Accelerator.
Proceedings of the 2015 Euromicro Conference on Digital System Design, 2015

A re-entrant flowshop heuristic for online scheduling of the paper path in a large scale printer.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Inter-tile reuse optimization applied to bandwidth constrained embedded accelerators.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Memristor based computation-in-memory architecture for data-intensive applications.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

(AS)<sup>2</sup>: accelerator synthesis using algorithmic skeletons for rapid design space exploration.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Online multi-face detection and tracking using detector confidence and structured SVMs.
Proceedings of the 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2015

Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths.
Proceedings of the 26th IEEE International Conference on Application-specific Systems, 2015

2014
Bones: An Automatic Skeleton-Based C-to-CUDA Compiler for GPUs.
ACM Trans. Archit. Code Optim., 2014

Construction and exploitation of VLIW ASIPs with heterogeneous vector-widths.
Microprocess. Microsystems, 2014

An End-to-End Computing Model for the Square Kilometre Array.
Computer, 2014

Instruction-set architecture exploration of VLIW ASIPs using a genetic algorithm.
Proceedings of the 3rd Mediterranean Conference on Embedded Computing, 2014

Construction and exploitation of VLIW asips with multiple vector-widths.
Proceedings of the 3rd Mediterranean Conference on Embedded Computing, 2014

Automatic complex instruction identification for efficient application mapping onto ASIPs.
Proceedings of the IEEE 5th Latin American Symposium on Circuits and Systems, 2014

A framework for automatic custom instruction identification on multi-issue ASIPs.
Proceedings of the 12th IEEE International Conference on Industrial Informatics, 2014

A tool for fast ground truth generation for object detection and tracking from video.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

A detailed GPU cache model based on reuse distance theory.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

A Study of the Potential of Locality-Aware Thread Scheduling for GPUs.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

BuildMaster: Efficient ASIP architecture exploration through compilation and simulation result caching.
Proceedings of the 17th International Symposium on Design and Diagnostics of Electronic Circuits & Systems, 2014

Timing analysis of First-Come First-Served scheduled interval-timed Directed Acyclic Graphs.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Reduction Operator for Wide-SIMDs Reconsidered.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Symbolic Analysis of Dataflow Applications Mapped onto Shared Heterogeneous Resources.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
Schedule-Extended Synchronous Dataflow Graphs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2013

An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA.
ACM Trans. Archit. Code Optim., 2013

Algorithmic species: A classification of affine loop nests for parallel programming.
ACM Trans. Archit. Code Optim., 2013

Efficient communication support in predictable heterogeneous MPSoC designs for streaming applications.
J. Syst. Archit., 2013

GPU-CC: a reconfigurable GPU architecture with communicating cores.
Proceedings of the International Workshop on Software and Compilers for Embedded Systems, 2013

SIMD made explicit.
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

OpenCL code generation for low energy wide SIMD architectures with explicit datapath.
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Throughput-constrained DVFS for scenario-aware dataflow graphs.
Proceedings of the 19th IEEE Real-Time and Embedded Technology and Applications Symposium, 2013

MAMPSx: A design framework for rapid synthesis of predictable heterogeneous MPSoCs.
Proceedings of the 24th IEEE International Symposium on Rapid System Prototyping, 2013

Automated extraction of scenario sequences from disciplined dataflow networks.
Proceedings of the 11th ACM/IEEE International Conference on Formal Methods and Models for Codesign, 2013

Instruction-set architecture exploration strategies for deeply clustered VLIW ASIPs.
Proceedings of the 2nd Mediterranean Conference on Embedded Computing, 2013

RASW: A run-time adaptive sliding window to improve Viola-Jones object detection.
Proceedings of the Seventh International Conference on Distributed Smart Cameras, 2013

Memory-centric accelerator design for Convolutional Neural Networks.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Simulation and architecture improvements of atomic operations on GPU scratchpad memory.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

A Reconfigurable Ray-Tracing Multi-Processor SoC with Hardware Replication-Aware Instruction Set Extension.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

MAMPSX: A demonstration of rapid, predictable HMPSOC synthesis.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

Thermal-aware mapping of streaming applications on 3D Multi-Processor Systems.
Proceedings of the 11th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2013

An Efficient Method for Energy Estimation of Application Specific Instruction-Set Processors.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013

Dataflow-Based Multi-ASIP Platform Approach for Digital Control Applications.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013

Exploring processor parallelism: Estimation methods and optimization strategies.
Proceedings of the 16th IEEE International Symposium on Design and Diagnostics of Electronic Circuits & Systems, 2013

Future of GPGPU micro-architectural parameters.
Proceedings of the Design, Automation and Test in Europe, 2013

Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification.
Proceedings of the Advanced Parallel Processing Technologies, 2013

2012
The boat hull model: adapting the roofline model to enable performance prediction for parallel computing.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Parametric throughput analysis of scenario-aware dataflow graphs.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

GPU-Vote: A Framework for Accelerating Voting Algorithms on GPU.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Minimizing Power Consumption of Spatial Division Based Networks-on-Chip Using Multi-path and Frequency Reduction.
Proceedings of the 15th Euromicro Conference on Digital System Design, 2012

Playing games with scenario- and resource-aware SDF graphs through policy iteration.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Scheduling for register file energy minimization in explicit datapath architectures.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Modeling static-order schedules in synchronous dataflow graphs.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Worst-case throughput analysis of real-time dynamic streaming applications.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

The boat hull model: enabling performance prediction for parallel computing prior to code development.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

Energy efficient special instruction support in an embedded processor with compact isa.
Proceedings of the 15th International Conference on Compilers, 2012

Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons.
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

2011
Fast multidimension multichoice knapsack heuristic for MP-SoC runtime management.
ACM Trans. Embed. Comput. Syst., 2011

From Xetal-II to Xetal-Pro: On the Road Toward an Ultralow-Energy and High-Throughput SIMD Processor.
IEEE Trans. Circuits Syst. Video Technol., 2011

Error computation for predictable real-time software synthesis.
Simul., 2011

Distributed resource management for concurrent execution of multimedia applications on MPSoC platforms.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Skeleton-based automatic parallelization of image processing algorithms for GPUs.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

MOVE-Pro: A low power and high code density TTA architecture.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Resource-Efficient Real-Time Scheduling Using Credit-Controlled Static-Priority Arbitration.
Proceedings of the 17th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2011

Bottlenecks and Tradeoffs in High Frame Rate Visual Servoing: A Case Study.
Proceedings of the IAPR Conference on Machine Vision Applications (IAPR MVA 2011), 2011

Analyzing synchronous dataflow scenarios for dynamic software-defined radio applications.
Proceedings of the 2011 International Symposium on System on Chip, 2011

Quantifying the common computational problems in contemporary applications.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Demo: An embedded vision system for high frame rate visual servoing.
Proceedings of the 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras, 2011

PhD forum: A cyber-physical system approach to embedded visual servoing.
Proceedings of the 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras, 2011

Iteration-Based Trade-Off Analysis of Resource-Aware SDF.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

Exploiting Inter and Intra Application Dynamism to Save Energy.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

Hybrid Code-Data Prefetch-Aware Multiprocessor Task Graph Scheduling.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

A 0.964mW digital hearing aid system.
Proceedings of the Design, Automation and Test in Europe, 2011

An Automated Flow to Map Throughput Constrained Applications to a MPSoC.
Proceedings of the Bringing Theory to Practice: Predictability and Performance in Embedded Systems, 2011

Resynchronization of Cyclo-Static Dataflow graphs.
Proceedings of the Design, Automation and Test in Europe, 2011

Parallelization of while loops in nested loop programs for shared-memory multiprocessor systems.
Proceedings of the Design, Automation and Test in Europe, 2011

An MPSoC design approach for multiple use-cases of throughput constrainted applications.
Proceedings of the 8th Conference on Computing Frontiers, 2011

High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs.
Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

Efficiency Optimization of Trainable Feature Extractors for a Consumer Platform.
Proceedings of the Advances Concepts for Intelligent Vision Systems, 2011

Feasibility Analysis of Ultra High Frame Rate Visual Servoing on FPGA and SIMD Processor.
Proceedings of the Advances Concepts for Intelligent Vision Systems, 2011

Fast Hough Transform on GPUs: Exploration of Algorithm Trade-Offs.
Proceedings of the Advances Concepts for Intelligent Vision Systems, 2011

2010
A Safari Through the MPSoC Run-Time Management Jungle.
J. Signal Process. Syst., 2010

Iterative Probabilistic Performance Prediction for Multi-Application Multiprocessor Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

An Ultra-Low-Energy Multi-Standard JPEG Co-Processor in 65 nm CMOS With Sub/Near Threshold Supply Voltage.
IEEE J. Solid State Circuits, 2010

CA-MPSoC: An automated design flow for predictable multi-processor architectures for multiple applications.
J. Syst. Archit., 2010

Fast Huffman decoding by exploiting data level parallelism.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Compile-time GPU memory access optimizations.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Thermal-aware scratchpad memory design and allocation.
Proceedings of the 28th International Conference on Computer Design, 2010

Conservative application-level performance analysis through simulation of MPSoCs.
Proceedings of the 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia, 2010

Automated bottleneck-driven design-space exploration of media processing systems.
Proceedings of the Design, Automation and Test in Europe, 2010

Xetal-Pro: an ultra-low energy and high throughput SIMD processor.
Proceedings of the 47th Design Automation Conference, 2010

A predictable communication assist.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Trade-offs in loop transformations.
ACM Trans. Design Autom. Electr. Syst., 2009

System-scenario-based design of dynamic embedded systems.
ACM Trans. Design Autom. Electr. Syst., 2009

Patterns for Automatic Generation of Soft Real-time System Models.
Simul., 2009

Soft reliability: an interdisciplinary approach with a user-system focus.
Qual. Reliab. Eng. Int., 2009

Quality-of-service trade-off analysis for wireless sensor networks.
Perform. Evaluation, 2009

Dictionary-based program compression on customizable processor architectures.
Microprocess. Microsystems, 2009

Dealing with data dependent conditions to enable general global source code transformations.
Int. J. Embed. Syst., 2009

Performance evaluation of concurrently executing parallel applications on multi-processor systems.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

Improving Product Usage Monitoring and Analysis with Semantic Concepts.
Proceedings of the Information Systems: Modeling, 2009

An ultra-low-energy/frame multi-standard JPEG co-processor in 65nm CMOS with sub/near-threshold power supply.
Proceedings of the IEEE International Solid-State Circuits Conference, 2009

QoS Management for Wireless Sensor Networks with a Mobile Sink.
Proceedings of the Wireless Sensor Networks, 6th European Conference, 2009

Exploring trade-offs between performance and resource requirements for synchronous dataflow graphs.
Proceedings of the 7th IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2009

Fast and accurate protocol specific bus modeling using TLM 2.0.
Proceedings of the Design, Automation and Test in Europe, 2009

A tuneable software cache coherence protocol for heterogeneous MPSoCs.
Proceedings of the 7th International Conference on Hardware/Software Codesign and System Synthesis, 2009

Analytics for the internet of things.
Proceedings of the 27th International Conference on Human Factors in Computing Systems, 2009

2008
Scenario Selection and Prediction for DVS-Aware Scheduling of Multimedia Applications.
J. Signal Process. Syst., 2008

Run-Time Management of a MPSoC Containing FPGA Fabric Tiles.
IEEE Trans. Very Large Scale Integr. Syst., 2008

Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA.
ACM Trans. Design Autom. Electr. Syst., 2008

Analyzing composability of applications on MPSoC platforms.
J. Syst. Archit., 2008

Application Scenarios in Streaming-Oriented Embedded-System Design.
IEEE Des. Test Comput., 2008

Model Interpretation for Executable Observation Specifications.
Proceedings of the Twentieth International Conference on Software Engineering & Knowledge Engineering (SEKE'2008), 2008

DC-SIMD : Dynamic communication for SIMD processors.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Enabling MPSoC Design Space Exploration on FPGAs.
Proceedings of the Wireless Networks, 2008

Real-time implementations of Hough Transform on SIMD architecture.
Proceedings of the 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras, 2008

Mapping facial expression recognition algorithms on a low-power smart camera.
Proceedings of the 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras, 2008

UML Profile for Modeling Product Observation.
Proceedings of the Forum on specification and Design Languages, 2008

Intra- and inter-processor hybrid performance modeling for MPSoC architectures.
Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis, 2008

Statistical noise margin estimation for sub-threshold combinational circuits.
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008

Distributed Smart Camera Calibration Using Blinking LED.
Proceedings of the Advanced Concepts for Intelligent Vision Systems, 2008

Real-Time Hough Transform on 1-D SIMD Processors: Implementation and Architecture Exploration.
Proceedings of the Advanced Concepts for Intelligent Vision Systems, 2008

Specification for User Modeling with Self-Observing Systems.
Proceedings of the First International Conference on Advances in Computer-Human Interaction, 2008

2007
Inter-cluster communication in VLIW architectures.
ACM Trans. Archit. Code Optim., 2007

Predictable real-time software synthesis.
Real Time Syst., 2007

Design-time application mapping and platform exploration for MP-SoC customised run-time management.
IET Comput. Digit. Tech., 2007

A Systematic Approach to Design Low-Power Video Codec Cores.
EURASIP J. Embed. Syst., 2007

Exploiting the Expressiveness of Cyclo-Static Dataflow to Model Multimedia Implementations.
EURASIP J. Adv. Signal Process., 2007

The Impact of Higher Communication Layers on NoC Supported MP-SoCs.
Proceedings of the First International Symposium on Networks-on-Chips, 2007

Analysing qos trade-offs in wireless sensor networks.
Proceedings of the 10th International Symposium on Modeling Analysis and Simulation of Wireless and Mobile Systems, 2007

Heuristics for Scenario Creation to Enable General Loop Transformations.
Proceedings of the International Symposium on System-on-Chip, 2007

V<sub>t</sub> balancing and device sizing towards high yield of sub-threshold static logic gates.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Multi-processor System-level Synthesis for Multiple Applications on Platform FPGA.
Proceedings of the FPL 2007, 2007

A Quick Safari Through the MPSoC Run-Time Management Jungle.
Proceedings of the 2007 5th Workshop on Embedded Systems for Real-Time Multimedia, 2007

Very wide register: an asymmetric register file organization for low power embedded processors.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Interactive presentation: An FPGA design flow for reconfigurable network-based multi-processor systems on chip.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Multiprocessor Resource Allocation for Throughput-Constrained Synchronous Dataflow Graphs.
Proceedings of the 44th Design Automation Conference, 2007

Introducing the SuperGT Network-on-Chip; SuperGT QoS: more than just GT.
Proceedings of the 44th Design Automation Conference, 2007

A Probabilistic Approach to Model Resource Contention for Performance Estimation of Multi-featured Media Devices.
Proceedings of the 44th Design Automation Conference, 2007

A model-driven design approach for mechatronic systems.
Proceedings of the Seventh International Conference on Application of Concurrency to System Design (ACSD 2007), 2007

2006
Systematic Preprocessing of Data Dependent Constructs for Embedded Systems.
J. Low Power Electron., 2006

RC-SIMD: Reconfigurable communication SIMD architecture for image processing applications.
J. Embed. Comput., 2006

Skeletons and Asynchronous RPC for Embedded Data and Task Parallel Image Processing.
IEICE Trans. Inf. Syst., 2006

Instruction Transfer And Storage Exploration for Low Energy VLIWs.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2006

Pareto-Based Application Specification for MP-SoC Customized Run-Time Management.
Proceedings of 2006 International Conference on Embedded Computer Systems: Architectures, 2006

Profiling Driven Scenarion Detection and Prediction for Multimedia Applications.
Proceedings of 2006 International Conference on Embedded Computer Systems: Architectures, 2006

Probabilistic Modelling and Evaluation of Soft Real-Time Embedded Systems.
Proceedings of the Embedded Computer Systems: Architectures, 2006

Strengthening Property Preservation in Concurrent Real-Time Systems.
Proceedings of the 12th IEEE Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2006), 2006

Correctness-preserving synthesis for real-time control software.
Proceedings of the Sixth International Conference on Quality Software (QSIC 2006), 2006

Fast Multi-Dimension Multi-Choice Knapsack Heuristic for MP-SoC Run-Time Management.
Proceedings of the International Symposium on System-on-Chip, 2006

Run-time reconfiguration of communication in SIMD architectures.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Algorithmic skeletons for stream programming in embedded heterogeneous parallel image processing applications.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Reusing Real-Time Systems Design Experience.
Proceedings of the Forum on specification and Design Languages, 2006

Resource Manager for Non-preemptive Heterogeneous Multiprocessor System-on-chip.
Proceedings of the 2006 4th Workshop on Embedded Systems for Real-Time Multimedia, 2006

Exploiting Hierarchical Configuration to Improve Run-Time MPSoC Task Assignment.
Proceedings of the 2006 International Conference on Engineering of Reconfigurable Systems & Algorithms, 2006

Global Analysis of Resource Arbitration for MPSoC.
Proceedings of the Ninth Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD 2006), 30 August, 2006

Branching-Time Property Preservation Between Real-Time Systems.
Proceedings of the Automated Technology for Verification and Analysis, 2006

Dynamic-SIMD for lens distortion compensation.
Proceedings of the 2006 IEEE International Conference on Application-Specific Systems, 2006

2005
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors.
IEEE Trans. Computers, 2005

Iterative compilation for energy reduction.
J. Embed. Comput., 2005

Instruction buffering exploration for low energy embedded processors.
J. Embed. Comput., 2005

Evaluation of Speed and Area of Clustered VLIW Processors.
Proceedings of the 18th International Conference on VLSI Design (VLSI Design 2005), 2005

Global Memory Optimisation for Embedded Systems Allowed by Code Duplication.
Proceedings of the 9th International Workshop on Software and Compilers for Embedded Systems, Dallas, Texas, USA, September 29, 2005

Distributed Congestion Control for Packet Switched Networks on Chip.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Design-Time Application Exploration for MP-SoC Customized Run-Time Management.
Proceedings of the 2005 International Symposium on System-on-Chip, 2005

Dictionary-based program compression on transport triggered architectures.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

Synthesis for Unified Control- and Data-Oriented Models.
Proceedings of the Forum on specification and Design Languages, 2005

Dynamic Time-Slot Allocation for QoS Enabled Networks on Chip.
Proceedings of the 2005 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005

Combining Data and Instruction Memory Energy Optimizations for Embedded Applications.
Proceedings of the 2005 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005

Centralized end-to-end flow control in a best-effort network-on-chip.
Proceedings of the EMSOFT 2005, 2005

Automatic scenario detection for improved WCET estimation.
Proceedings of the 42nd Design Automation Conference, 2005

Intra-task scenario-aware voltage scheduling.
Proceedings of the 2005 International Conference on Compilers, 2005

Power Breakdown Analysis for a Heterogeneous NoC Platform Running a Video Application.
Proceedings of the 16th IEEE International Conference on Application-Specific Systems, 2005

Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures.
Proceedings of the Advanced Concepts for Intelligent Vision Systems, 2005

2004
L0 buffer energy optimization through scheduling and exploration.
Proceedings of the 2004 ACM Symposium on Applied Computing (SAC), 2004

Design Style Case Study for Embedded Multi Media Compute Nodes.
Proceedings of the 25th IEEE Real-Time Systems Symposium (RTSS 2004), 2004

L0 Cluster Synthesis and Operation Shuffling.
Proceedings of the Integrated Circuit and System Design, 2004

A Unified Model for Analysis of Real-Time Properties.
Proceedings of the International Symposium on Leveraging Applications of Formal Methods, 2004

Error Estimation in Model-Driven Development for Real-Time Software.
Proceedings of the Forum on specification and Design Languages, 2004

Instruction buffering exploration for low energy VLIWs with instruction clusters.
Proceedings of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair 2004, 2004

2003
Global interconnect trade-off for technology over memory modules to application level: case study.
Proceedings of the 5th International Workshop on System-Level Interconnect Prediction (SLIP 2003), 2003

Limited Address Range Architecture for Reducing Code Size in Embedded Processors.
Proceedings of the Software and Compilers for Embedded Systems, 7th International Workshop, 2003

Advanced copy propagation for arrays.
Proceedings of the 2003 Conference on Languages, 2003

Evaluating Template-Based Instruction Compression on Transport Triggered Architectures.
Proceedings of the 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications (IWSOC'03), 30 June, 2003

Immediate optimization for compressed transport triggered architecture instructions.
Proceedings of the 2003 International Symposium on System-on-Chip, 2003

Inter-Cluster Communication Models for Clustered VLIW Processors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Low Power Coarse-Grained Reconfigurable Instruction Set Processor.
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

SDRAM-Energy-Aware Memory Allocation for Dynamic Multi-Media Applications on Multi-Processor Platforms.
Proceedings of the 2003 Design, 2003

Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations.
Proceedings of the 2003 Design, 2003

Cluster assignment of global values for clustered VLIW processors.
Proceedings of the International Conference on Compilers, 2003

SDRAM-Energy-Aware Memory Allocation for Dynamic Multi-Media Applications on Multi-Processor Platforms.
Proceedings of the Embedded Software for SoC, 2003

2002
Interconnect exploration for future wire dominated technologies.
Proceedings of the Fourth IEEE/ACM International Workshop on System-Level Interconnect Prediction (SLIP 2002), 2002

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors.
Proceedings of the Integrated Circuit Design. Power and Timing Modeling, 2002

Global Variable Promotion: Using Registers to Reduce Cache Power Dissipation.
Proceedings of the Compiler Construction, 11th International Conference, 2002

2001
Implementation of encryption algorithms on transport triggered architectures.
Proceedings of the 2001 International Symposium on Circuits and Systems, 2001

Code Positioning for VLIW Architectures.
Proceedings of the High-Performance Computing and Networking, 9th International Conference, 2001

Designing domain-specific processors.
Proceedings of the Ninth International Symposium on Hardware/Software Codesign, 2001

2000
Computation in the Context of Transport Triggered Architectures.
Int. J. Parallel Program., 2000

Link-time effective whole-program optimizations.
Future Gener. Comput. Syst., 2000

Hashed Addressed Caches for Embedded Pointer Based Codes (Research Note).
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Automated Design of an ASIP for Image Processing Applications (Research Note).
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Automatic SIMD Parallelization of Embedded Applications Based on Pattern Recognition.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
TTAs: Missing the ILP complexity wall.
J. Syst. Archit., 1999

A Linker for effective Whole-Program Optimization.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

Transformatiing and Parallelizing ANSI C Programs using Pattern Recognition.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

Automatic detection of recurring operation patterns.
Proceedings of the Seventh International Workshop on Hardware/Software Codesign, 1999

Floating Point to Fixed Point Conversion of C Code.
Proceedings of the Compiler Construction, 8th International Conference, 1999

A Programmable ANSI C Transformation Engine.
Proceedings of the Compiler Construction, 8th International Conference, 1999

1998
Using Transport Triggered Architectures for Embedded Processor Design.
Integr. Comput. Aided Eng., 1998

Overcoming the limitations of the traditional loop parallelization.
Future Gener. Comput. Syst., 1998

Design Space Exploration Algorithm for Heterogeneous Multi-Processor Embedded System Design.
Proceedings of the 35th Conference on Design Automation, 1998

Exploiting Fine- and Coarse-Grain Parallelism in Embedded Programs.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997
Making Graphs Reducible with Controlled Node Splitting.
ACM Trans. Program. Lang. Syst., 1997

The Potential of Exploiting Coarse-Grain Task Parallelism from Sequential Programs.
Proceedings of the High-Performance Computing and Networking, 1997

FP-map-an approach to the functional pipelining of embedded programs.
Proceedings of the Fourth International on High-Performance Computing, 1997

A different approach to high performance computing.
Proceedings of the Fourth International on High-Performance Computing, 1997

ADVISE: Performance Evaluation of Parallel VHDL Simulation.
Proceedings of the Proceedings 30st Annual Simulation Symposium (SS '97), April 7-9, 1997, 1997

Design of Heterogenous Multi-Processor Embedded Systems: Applying Functional Pipelining.
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

Microprocessor architectures - from VLIW to TTA.
Wiley, ISBN: 978-0-471-97157-3, 1997

1996
Controlled Node Splitting.
Proceedings of the Compiler Construction, 6th International Conference, 1996

1995
Partitioned register file for TTAs.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

1994
Register file port requirements of transport triggered architectures.
Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994

Design of transport triggered architectures.
Proceedings of the Fourth Great Lakes Symposium on Design Automation of High Performance VLSI Systems, 1994

Application Driven MIMD Communication Processor Design.
Proceedings of the Massively Parallel Processing Applications and Develompent, 1994

A new flexible VHDL simulator.
Proceedings of the Proceedings EURO-DAC'94, 1994

Code generation for transport triggered architectures.
Proceedings of the Code Generation for Embedded Processors [Dagstuhl Workshop, Dagstuhl, Germany, August 31, 1994

Transport-Triggering versus Operation-Triggering.
Proceedings of the Compiler Construction, 5th International Conference, 1994

1993
Move32int, a sea of gates realization of a high performance transport triggered architecture.
Microprocess. Microprogramming, 1993

Evaluating transport triggered architectures for scalar applications.
Microprocess. Microprogramming, 1993

The OSI Model Applied to MIMD Communication Processor Design.
Proceedings of the Parallel Computing: Trends and Applications, 1993

1992
Comparing Software Pipelining for an Operation-Triggered and a Tarnsport-Triggered Architecture.
Proceedings of the Compiler Construction, 1992

1991
MOVE: a framework for high-performance processor design.
Proceedings of the Proceedings Supercomputing '91, 1991

Software Pipelining for Transport-Triggered Architectures.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

A Scalable Communication Processor Design supporting Systolic Communication.
Proceedings of the Distributed Memory Computing, 2nd European Conference, 1991

Distributed Heapmanagement using reference weights.
Proceedings of the Distributed Memory Computing, 2nd European Conference, 1991

1989
DOAS: an object oriented architecture supporting secure languages.
Proceedings of the 22nd Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1989


  Loading...