Ehsan Atoofian

Orcid: 0000-0002-1662-5334

According to our database1, Ehsan Atoofian authored at least 66 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



Transient Fault Detection in Tensor Cores for Modern GPUs.
ACM Trans. Embed. Comput. Syst., September, 2024

Improving Energy-Efficiency of Capsule Networks on Modern GPUs.
IEEE Comput. Archit. Lett., 2024

Inexact Quantum Square Root Circuit for NISQ Devices.
IEEE Access, 2024

Hardened-TC: A Low-cost Reliability Solution for CNNs Run by Modern GPUs.
Proceedings of the 37th IEEE International System-on-Chip Conference, 2024

Low-Power Register File for Tensor Cores.
Proceedings of the 15th IEEE International Green and Sustainable Computing Conference, 2024

PCTC: Hardware and Software Co-design for Pruned Capsule Networks on Tensor Cores.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

EAM: Ensemble of approximate multipliers for robust DNNs.
Microprocess. Microsystems, April, 2023

PTTS: Power-aware tensor cores using two-sided sparsity.
J. Parallel Distributed Comput., March, 2023

NISQ-Friendly Non-Linear Activation Functions for Quantum Neural Networks.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2022

Increasing Robustness against Adversarial Attacks through Ensemble of Approximate Multipliers.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2022

Practical approximate quantum multipliers for NISQ devices.
Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

Adaptive Computation Reuse for Energy-Efficient Training of Deep Neural Networks.
ACM Trans. Embed. Comput. Syst., 2021

Reducing Energy in GPGPUs through Approximate Trivial Bypassing.
ACM Trans. Embed. Comput. Syst., 2021

Trivial Bypassing in GPGPUs.
IEEE Embed. Syst. Lett., 2021

Sparsity-aware Power Gating for Tensor Cores.
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

Approximate Cache in GPGPUs.
ACM Trans. Embed. Comput. Syst., 2020

Energy Efficient On-Demand Dynamic Branch Prediction Models.
IEEE Trans. Computers, 2020

Approximate trivial instructions.
Proceedings of the 17th ACM International Conference on Computing Frontiers, 2020

Data-type specific cache compression in GPGPUs.
J. Supercomput., 2018

TELEPORT: Hardware/software alternative to CUDA shared memory programming.
Microprocess. Microsystems, 2018

Improving performance of transactional memory through machine learning.
Concurr. Comput. Pract. Exp., 2018

Loop Perforation in OpenACC.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Mitigating Critical Path Decompression Latency in Compressed L1 Data Caches Via Prefetching.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Reducing Power of Memory Hierarchy in General Purpose Graphics Processing Units.
J. Low Power Electron., 2017

An efficient racetrack memory for L2 cache in GPGPUs.
Comput. Syst. Sci. Eng., 2017

Many-Thread Aware Compression in GPGPUs.
Proceedings of the 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, 2016

Temperature-Aware Register Mapping in GPGPUs.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

A low power STT-RAM based register file for GPGPUs.
Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016

Improving Performance of Transactional Applications through Adaptive Transactional Memory.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Compressed L1 data cache and L2 cache in GPGPUs.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

TurboLock: increasing associativity of lock table in transactional memory.
Computing, 2015

Workshop Preview of the 2nd International Workshop on Software for Parallel Systems (SEPS 2015).
Proceedings of the Companion Proceedings of the 2015 ACM SIGPLAN International Conference on Systems, 2015

Shift-aware racetrack memory.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Automatic Optimization of Software Transactional Memory Through Linear Regression and Decision Tree.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Reducing shift penalty in Domain Wall Memory through register locality.
Proceedings of the 2015 International Conference on Compilers, 2015

Boosting performance of transactional memory through O-GEHL predictors.
Microprocess. Microsystems, 2014

Acceleration of Software Transactional Memory through Hardware Clock.
Proceedings of the 2nd International Workshop on Many-core Embedded Systems, 2014

Reducing Static and Dynamic Power of L1 Data Caches in GPGPUs.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Power-Aware L1 and L2 Caches for GPGPUs.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

Improving Power of Cache and Register File through Critical Path Instructions.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

Improving performance of software transactional memory through contention locality.
J. Supercomput., 2013

ARV-ALA: Improving performance of software transactional memory through adaptive read and write policies.
Sci. Comput. Program., 2013

TxSnoop: Power-Aware Transactional Snoop.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Consistency Check through O-GEHL Predictors.
Proceedings of the 21st Euromicro International Conference on Parallel, 2013

Read-Write Lock Allocation in Software Transactional Memory.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

VGTS: Variable Granularity Transactional Snoop.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

AGC: adaptive global clock in software transactional memory.
Proceedings of the 2012 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, 2012

ArTA: Adaptive Granularity in Transactional Applications.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

TRT: Transactional Read Tracking.
Proceedings of the 13th International Conference on Parallel and Distributed Computing, 2012

Maintaining Consistency in Software Transactional Memory through Dynamic Versioning Tuning.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

Speculative Versioning through Perceptron Predictors.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Speculative Contention Avoidance in Software Transactional Memory.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Using supplier locality in power-aware interconnects and caches in chip multiprocessors.
J. Syst. Archit., 2008

Exploiting program cyclic behavior to reduce memory latency in embedded processors.
Proceedings of the 2008 ACM Symposium on Applied Computing (SAC), 2008

Adaptive Read Validation in Time-Based Software Transactional Memory.
Proceedings of the Euro-Par 2008 Workshops, 2008

Speculative trivialization point advancing in high-performance processors.
J. Syst. Archit., 2007

Exploiting Speculation Cost Prediction in Power-Aware Applications.
J. Low Power Electron., 2007

A Power-Aware Prediction-Based Cache Coherence Protocol for Chip Multiprocessors.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Speculative supplier identification for reducing power of interconnects in snoopy cache coherence protocols.
Proceedings of the 4th Conference on Computing Frontiers, 2007

Computational and storage power optimizations for the O-GEHL branch predictor.
Proceedings of the 4th Conference on Computing Frontiers, 2007

A Test Approach for Look-Up Table Based FPGAs.
J. Comput. Sci. Technol., 2006

A low-power scan-path architecture.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

Improving Energy-Efficiency by Bypassing Trivial Computations.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Low-power prediction based data transfer architecture.
Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005

A Low Power BIST Architecture for FPGA Look-Up Table Testing.
Proceedings of the IFIP VLSI-SoC 2003, 2003

A BIST Architecture for FPGA Look-Up Table Testing Reduces Reconfigurations.
Proceedings of the 12th Asian Test Symposium (ATS 2003), 17-19 November 2003, Xian, China, 2003
