Ahmed Hemani

Orcid: 0000-0003-0565-9376

According to our database1, Ahmed Hemani authored at least 163 papers between 1990 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Automating functional unit and register binding for synchoros CGRA platform.
Des. Autom. Embed. Syst., June, 2024

Modeling Cycle-to-Cycle Variation in Memristors for In-Situ Unsupervised Trace-STDP Learning.
IEEE Trans. Circuits Syst. II Express Briefs, February, 2024

DUDE: Decryption, Unpacking, Deobfuscation, and Endian Conversion Framework for Embedded Devices Firmware.
IEEE Trans. Dependable Secur. Comput., 2024

CIS: Composable Instruction Set for Streaming Applications: Design, Modeling, and Scheduling.
CoRR, 2024

Application Level Synthesis: Creating Matrix-Matrix Multiplication Library: A Case Study.
IEEE Access, 2024

Integer Linear Programming-Based Simultaneous Scheduling and Binding for SiLago Framework.
IEEE Access, 2024

Exploration of Custom Floating-Point Formats: A Systematic Approach.
Proceedings of the 27th Euromicro Conference on Digital System Design, 2024

FPGA-Based HPC for Associative Memory System.
Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024

2023
A Memristor-Based Learning Engine for Synaptic Trace-Based Online Learning.
IEEE Trans. Biomed. Circuits Syst., October, 2023

Clock tree generation by abutment in synchoros VLSI design.
Microprocess. Microsystems, 2023

DRRA-based Reconfigurable Architecture for Mixed-Radix FFT.
Proceedings of the 36th International Conference on VLSI Design and 2023 22nd International Conference on Embedded Systems, 2023

Implementation of Image Averaging on DRRA and DiMArch Architectures.
Proceedings of the 36th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design, 2023

Optoelectronic Memristor Model for Optical Synaptic Circuit of Spiking Neural Networks.
Proceedings of the 21st IEEE Interregional NEWCAS Conference, 2023

Optimizing Self-Organizing Maps for Bacterial Genome Identification on Parallel Ultra-Low-Power Platforms.
Proceedings of the 30th IEEE International Conference on Electronics, Circuits and Systems, 2023

Efficient Implementation of 2-D Convolution on DRRA and DiMArch Architectures.
Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2023

Implementation of Sobel Edge Detection on DRRA and DiMArch Architectures.
Proceedings of the 26th Euromicro Conference on Digital System Design, 2023

2022
MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networks.
J. Syst. Archit., 2022

Vesyla-II: An Algorithm Library Development Tool for Synchoros VLSI Design Style.
CoRR, 2022

Reducing the Configuration Overhead of the Distributed Two-level Control System.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Memristor-Based In-Circuit Computation for Trace-Based STDP.
Proceedings of the 4th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2022

2021
Refresh Triggered Computation: Improving the Energy Efficiency of Convolutional Neural Network Accelerators.
ACM Trans. Archit. Code Optim., 2021

Multi-objective Recurrent Neural Networks Optimization for the Edge - a Quantization-based Approach.
CoRR, 2021

Design and Implementation of Optimized Register File for Streaming Applications.
Proceedings of the 25th International Symposium on VLSI Design and Test, 2021

Synthesis of predictable global NoC by abutment in synchoros VLSI design.
Proceedings of the NOCS '21: International Symposium on Networks-on-Chip, 2021

Scheduling Persistent and Fully Cooperative Instructions.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

Approximate computation of post-synaptic spikes reduces bandwidth to synaptic storage in a model of cortex.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

A Memristor Model with Concise Window Function for Spiking Brain-Inspired Computation.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020
eBrainII: a 3 kW Realtime Custom 3D DRAM Integrated ASIC Implementation of a Biologically Plausible Model of a Human Scale Cortex.
J. Signal Process. Syst., 2020

Guest Editorial: Special Issue on Architectures and Design Methods for Neural Networks.
J. Signal Process. Syst., 2020

A FPGA-based Hardware Accelerator for Bayesian Confidence Propagation Neural Network.
Proceedings of the IEEE Nordic Circuits and Systems Conference, NorCAS 2020, Oslo, 2020

NACU: A Non-Linear Arithmetic Unit for Neural Networks.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
Configurable FFT Processor Using Dynamically Reconfigurable Resource Arrays.
J. Signal Process. Syst., 2019

eBrainII: A 3 kW Realtime Custom 3D DRAM integrated ASIC implementation of a Biologically Plausible Model of a Human Scale Cortex.
CoRR, 2019

Regional Clock Tree Generation by Abutment in Synchoros VLSI Design.
CoRR, 2019

Approximate Computing Applied to Bacterial Genome Identification using Self-Organizing Maps.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

2018
RiBoSOM: rapid bacterial genome identification using self-organizing map implemented on the synchoros SiLago platform.
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

2017
A Customized Many-Core Hardware Acceleration Platform for Short Read Mapping Problems Using Distributed Memory Interface with 3D-Stacked Architecture.
J. Signal Process. Syst., 2017

Quality-of-service-aware adaptation scheme for multi-core protocol processing architecture.
Microprocess. Microsystems, 2017

3D-Stacked Many-Core Architecture for Biological Sequence Analysis Problems.
Int. J. Parallel Program., 2017

Can a reconfigurable architecture beat ASIC as a CNN accelerator?
Proceedings of the 2017 International Conference on Embedded Computer Systems: Architectures, 2017

Synchoricity and NOCs could make Billion Gate Custom Hardware Centric SOCs Affordable.
Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip, 2017

SiLago-CoG: Coarse-Grained Grid-Based Design for Near Tape-Out Power Estimation Accuracy at High Level.
Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI, 2017

MOCHA: Morphable Locality and Compression Aware Architecture for Convolutional Neural Networks.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

MTP-Caffe: Memory, Timing, and Power aware tool for mapping CNNs to GPUs.
Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms, 2017

SPEED: Open-Source Framework to Accelerate Speech Recognition on Embedded GPUs.
Proceedings of the Euromicro Conference on Digital System Design, 2017

2016
Polymorphic Configuration Architecture for CGRAs.
IEEE Trans. Very Large Scale Integr. Syst., 2016

TransMap: Transformation Based Remapping and Parallelism for High Utilization and Energy Efficiency in CGRAs.
IEEE Trans. Computers, 2016

Gaussian Random Number Generation: A Survey on Hardware Architectures.
ACM Comput. Surv., 2016

Service-Guaranteed Multi-port Packet Memory for Parallel Protocol Processing Architecture.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

TransMem: A memory architecture to support dynamic remapping and parallelism in low power high performance CGRAs.
Proceedings of the 26th International Workshop on Power and Timing Modeling, 2016

The SiLago Method: Next Generation VLSI Architectures and Design Methods.
Proceedings of the Fourth ACM International Workshop on Many-core Embedded Systems, 2016

Elastic Management and QoS Provisioning Scheme for Adaptable Multi-core Protocol Processing Architecture.
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016

2015
Revisiting Central Limit Theorem: Accurate Gaussian Random Number Generation in VLSI.
IEEE Trans. Very Large Scale Integr. Syst., 2015

TEA: Timing and Energy Aware compression architecture for Efficient Configuration in CGRAs.
Microprocess. Microsystems, 2015

Architecture and Implementation of Dynamic Parallelism, Voltage and Frequency Scaling (PVFS) on CGRAs.
ACM J. Emerg. Technol. Comput. Syst., 2015

DyMeP: An Infrastructure to Support Dynamic Memory Binding for Runtime Mapping in CGRAs.
Proceedings of the 28th International Conference on VLSI Design, 2015

3D-stacked many-core architecture for biological sequence analysis problems.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Physical design aware system level synthesis of hardware.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

FIST: A Framework to Interleave Spiking Neural Networks on CGRAs.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Atomic stream computation unit based on micro-thread level parallelism.
Proceedings of the 26th IEEE International Conference on Application-specific Systems, 2015

2014
Low-Latency Maximal-Throughput Communication Interfaces for Rationally Related Clock Domains.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric.
Microprocess. Microsystems, 2014

Special issue on many-core embedded systems.
Microprocess. Microsystems, 2014

Private reliability environments for efficient fault-tolerance in CGRAs.
Des. Autom. Embed. Syst., 2014

RuRot: Run-time rotatable-expandable partitions for efficient mapping in CGRAs.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Customization methodology of a Coarse Grained Reconfigurable architecture.
Proceedings of the 2014 NORCHIP, Tampere, Finland, October 27-28, 2014, 2014

A many-core hardware acceleration platform for short read mapping problem using distributed memory interface with 3D-stacked architecture.
Proceedings of the 2014 International Symposium on System-on-Chip, 2014

Exploring Spiking Neural Network on Coarse-Grain Reconfigurable Architectures.
Proceedings of the 2nd International Workshop on Many-core Embedded Systems, 2014

NeuroCGRA: A CGRA with support for neural networks.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

TransPar: Transformation based dynamic Parallelism for low power CGRAs.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Accurate and Efficient Three Level Design Space Exploration Based on Constraints Satisfaction Optimization Problem Solver.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Customizable Compression Architecture for Efficient Configuration in CGRAs.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Three-Dimensional Design Space Exploration for System Level Synthesis.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

Morphable Compression Architecture for Efficient Configuration in CGRAs.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

Case Study: Constraint Programming in a System Level Synthesis Framework.
Proceedings of the Principles and Practice of Constraint Programming, 2014

Spiking brain models: Computation, memory and communication constraints for custom hardware implementation.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

A scalable custom simulation machine for the Bayesian Confidence Propagation Neural Network model of the brain.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

Customizable coarse-grained energy-efficient reconfigurable packet processing architecture.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
Power-aware dynamic memory management on many-core platforms utilizing DVFS.
ACM Trans. Embed. Comput. Syst., 2013

Energy-aware fault-tolerant network-on-chips for addressing multiple traffic classes.
Microprocess. Microsystems, 2013

Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in CGRAs.
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Memory allocation and optimization in system-level architectural synthesis.
Proceedings of the 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), 2013

Implementation and evaluation of configuration scrubbing on CGRAs: A case study.
Proceedings of the 2013 International Symposium on System on Chip, 2013

Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells.
Proceedings of the International Symposium on Quality Electronic Design, 2013

High performance 3D-FFT implementation.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

A code generation method for system-level synthesis on ASIC, FPGA and manycore CGRA.
Proceedings of the 1st International Workshop on Many-core Embedded Systems 2013, 2013

Global Control and Storage Synthesis for a System Level Synthesis Approach.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

Global Interconnect and Control Synthesis in System Level Architectural Synthesis Framework.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013

Energy-Aware Fault-Tolerant CGRAs Addressing Application with Different Reliability Needs.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013

Distributed Runtime Computation of Constraints for Multiple Inner Loops.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013

System level synthesis of hardware for DSP applications using pre-characterized function implementations.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2013

Private configuration environments (PCE) for efficient reconfiguration, in CGRAs.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

Unifying CORDIC and Box-Muller algorithms: An accurate and efficient Gaussian Random Number generator.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012
Effort, resources, and abstraction vs performance in high-level synthesis: finding new answers to an old question.
SIGARCH Comput. Archit. News, 2012

Low-Latency No-Handshake GALS Interfaces for Fast-Receiver Links.
Proceedings of the 25th International Conference on VLSI Design, 2012

A pragma based approach for mapping MATLAB applications on a coarse grained reconfigurable architecture.
Proceedings of the 25th Symposium on Integrated Circuits and Systems Design, 2012

Self-adaptive Noc Power Management with Dual-level Agents - Architecture and Implementation.
Proceedings of the PECCS 2012, 2012

Segmented Bus Based Path Setup Scheme for a Distributed Memory Architecture.
Proceedings of the IEEE 6th International Symposium on Embedded Multicore/Manycore SoCs, 2012

Classification of Massively Parallel Computer Architectures.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Improved Bioinformatics Processing Unit for Multiple Applications.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011
Addressing Dynamic Issues in Information Security Management.
Inf. Manag. Comput. Secur., 2011

NoC Based Distributed Partitionable Memory System for a Coarse Grain Reconfigurable Architecture.
Proceedings of the VLSI Design 2011: 24th International Conference on VLSI Design, 2011

A Library Development Framework for a Coarse Grain Reconfigurable Architecture.
Proceedings of the VLSI Design 2011: 24th International Conference on VLSI Design, 2011

A Reconfigurable Processor for Phylogenetic Inference.
Proceedings of the VLSI Design 2011: 24th International Conference on VLSI Design, 2011

Generating high tail accuracy Gaussian Random Numbers in hardware using central limit theorem.
Proceedings of the IEEE/IFIP 19th International Conference on VLSI and System-on-Chip, 2011

A Coarse-Grained Reconfigurable Processor for Sequencing and Phylogenetic Algorithms in Bioinformatics.
Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs, 2011

Synchronizing distributed state machines in a coarse grain reconfigurable architecture.
Proceedings of the 2011 International Symposium on System on Chip, 2011

A coarse-grained reconfigurable protocol processor.
Proceedings of the 2011 International Symposium on System on Chip, 2011

An efficient hardware implementation of high quality AWGN generator using Box-Muller method.
Proceedings of the 11th International Symposium on Communications and Information Technologies, 2011

Compression Based Efficient and Agile Configuration Mechanism for Coarse Grained Reconfigurable Architectures.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

A GALS Network-on-Chip based on rationally-related frequencies.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Compact generic intermediate representation (CGIR) to enable late binding in coarse grained reconfigurable architectures.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Optimal Selection of Function Implementation in a Hierarchical Configware Synthesis Method for a Coarse Grain Reconfigurable Architecture.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

Low-Latency and Low-Overhead Mesochronous and Plesiochronous Synchronizers.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

Predicting bus contention effects on energy and performance in multi-processor SoCs.
Proceedings of the Design, Automation and Test in Europe, 2011

Address generation scheme for a coarse grain reconfigurable architecture.
Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

2010
Inferring Energy and Performance Cost of RTOS in Priority-Driven Scheduling.
Proceedings of the IEEE Fifth International Symposium on Industrial Embedded Systems, 2010

Control Scheme for a CGRA.
Proceedings of the 22st International Symposium on Computer Architecture and High Performance Computing, 2010

A Coarse Grain Reconfigurable Architecture for sequence alignment problems in bio-informatics.
Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010



Improving performance of fading channel simulators by use of uniformly distributed random numbers.
Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, 2010

Distributed DVFS using rationally-related frequencies and discrete voltage levels.
Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

Lowering the latency of interfaces for rationally-related frequencies.
Proceedings of the 28th International Conference on Computer Design, 2010

Predicting energy and performance overhead of Real-Time Operating Systems.
Proceedings of the Design, Automation and Test in Europe, 2010

2009
A General Approach to High-Level Energy and Performance Estimation in System-on-Chip Architectures.
J. Low Power Electron., 2009

A General Approach to High-Level Energy and Performance Estimation in SoCs.
Proceedings of the VLSI Design 2009: Improving Productivity through Higher Abstraction, 2009

Morphable DPU: Smart and efficient data path for signal processing applications.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

Adaptability infrastructure for bridging IT security evaluation and options theory.
Proceedings of the 2nd International Conference on Security of Information and Networks, 2009

A flexible communication scheme for rationally-related clock frequencies.
Proceedings of the 27th International Conference on Computer Design, 2009

Option Based Evaluation: Security Evaluation of IT Products Based on Options Theory.
Proceedings of the First IEEE Eastern European Conference on the Engineering of Computer Based Systems, 2009

Energy and Performance Model of a SPARC Leon3 Processor.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

2002
Dynamic memory management methodology applied to embedded telecom network systems.
IEEE Trans. Very Large Scale Integr. Syst., 2002

A Network on Chip Architecture and Design Methodology.
Proceedings of the 2002 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2002), 2002

2001
System-level data-format exploration for dynamically allocated datastructures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Grammar-based design of embedded systems.
J. Syst. Archit., 2001

Hardware Software Codesign of DSP System Using Grammar Based Approach.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Test Strategies on Functionally Partitioned Module-Based Programmable Architecture for Base-Band Processing.
Proceedings of the Euromicro Symposium on Digital Systems Design 2001 (Euro-DSD 2001), 2001

2000
Grammar-based hardware synthesis from port-size independent specifications.
IEEE Trans. Very Large Scale Integr. Syst., 2000

A Metamodel for Studying Concepts in Electronic System Design.
IEEE Des. Test Comput., 2000

System Level Virtual Prototyping of DSP SOCs Using Grammar Based Approach.
Des. Autom. Embed. Syst., 2000

Reconfigurable Architecture for Base-band Data Processing.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

Development of Programmable Architecture for Base-Band Processing.
Proceedings of the 26th EUROMICRO 2000 Conference, 2000

System-level data format exploration for dynamically allocated data structures.
Proceedings of the 37th Conference on Design Automation, 2000

1999
System Level Virtual Prototyping of DSP ASICs Using Grammar Based Approach.
Proceedings of the Tenth IEEE International Workshop on Rapid System Prototyping (RSP 1999), 1999

Globally asynchronous locally synchronous architecture for large high-performance ASICs.
Proceedings of the 1999 International Symposium on Circuits and Systems, ISCAS 1999, Orlando, Florida, USA, May 30, 1999

Exploiting Data Transfer Locality in Memory Mapping.
Proceedings of the 25th EUROMICRO '99 Conference, 1999

The Rugby Model: A Conceptual Frame for the Study of Modelling, Analysis and Synthesis Concepts of Electronic Systems.
Proceedings of the 1999 Design, 1999

Lowering Power Consumption in Clock by Using Globally Asynchronous Locally Synchronous Design Style.
Proceedings of the 36th Conference on Design Automation, 1999

1998
A Methodology and Algorithms for Efficient Interprocess Communication Synthesis from System Description in SDL.
Proceedings of the 11th International Conference on VLSI Design (VLSI Design 1991), 1998

Specification of Exception Handling in Grammar-Based Hardware Synthesis.
Proceedings of the 24th EUROMICRO '98 Conference, 1998

Scheduling of Outputs in Grammar-based Hardware Synthesis of Data Communication Protocols.
Proceedings of the 1998 Design, 1998

1997
System oriented VLSI curriculum at KTH.
Proceedings of the 1997 IEEE International Conference on Microelectronic Systems Education, 1997

1996
A Novell Allocation Strategy for Control and Memory Intensive Telecommunication Circiuts.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

A Rule-Based Approach for Improving Allocation of Filter Structures in HLS.
Proceedings of the 9th International Conference on VLSI Design (VLSI Design 1996), 1996

Grammar-Based Hardware Synthesis of Data Communication Protocols.
Proceedings of the 9th International Symposium on System Synthesis, 1996

Flexible Codesign Target Architecture for Early Prototyping of CMIST Systems.
Proceedings of the Field-Programmable Logic, 1996

1994
Application of High-Level Synthesis in an Industrial Project.
Proceedings of the Seventh International Conference on VLSI Design, 1994

Hardware/software partitioning and minimizing memory interface traffic.
Proceedings of the Proceedings EURO-DAC'94, 1994

1993
Self-Organization and its Application to Binding.
Proceedings of the Sixth International Conference on VLSI Design, 1993

1990
Cell placement by self-organisation.
Neural Networks, 1990

A neural net based self organising scheduling algorithm.
Proceedings of the European Design Automation Conference, 1990


  Loading...