Osman S. Unsal

Orcid: 0000-0002-0544-9697

According to our database1, Osman S. Unsal authored at least 214 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of three.
  • Erdős number3 of two.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Enhancing Fault Tolerance in High-Performance Computing: A Real Hardware Case Study on a RISC-V Vector Processing Unit.
IEEE Open J. Comput. Soc., 2024

DRAM Errors and Cosmic Rays: Space Invaders or Science Fiction?
CoRR, 2024

QUETZAL: Vector Acceleration Framework for Modern Genome Sequence Analysis Algorithms.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning : (Practical Experience Report).
Proceedings of the 19th European Dependable Computing Conference, 2024

2023
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications.
ACM Trans. Archit. Code Optim., June, 2023

Efficient thread-to-core mapping alternatives for application-level redundant multithreading.
Concurr. Comput. Pract. Exp., 2023

Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

VAQUERO: A Scratchpad-based Vector Accelerator for Query Processing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023


2022
MoRS: An Approximate Fault Modeling Framework for Reduced-Voltage SRAMs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Can We Trust Undervolting in FPGA-Based Deep Learning Designs at Harsh Conditions?
IEEE Micro, 2022

Adaptable Register File Organization for Vector Processors.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022


BiSon-e: a lightweight and high-performance accelerator for narrow integer linear algebra computing on the edge.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading.
J. Supercomput., 2021

MoRS: An Approximate Fault Modelling Framework for Reduced-Voltage SRAMs.
CoRR, 2021

Scrooge Attack: Undervolting ARM Processors for Profit.
CoRR, 2021

Scrooge Attack: Undervolting ARM Processors for Profit: Practical experience report.
Proceedings of the 40th International Symposium on Reliable Distributed Systems, 2021

FPGA Checkpointing for Scientific Computing.
Proceedings of the 27th IEEE International Symposium on On-Line Testing and Robust System Design, 2021

VIA: A Smart Scratchpad for Vector Units with Application to Sparse Matrix Computations.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Understanding Power Consumption and Reliability of High-Bandwidth Memory with Voltage Underscaling.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

2020
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures.
ACM Trans. Archit. Code Optim., 2020

Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins.
CoRR, 2020

Power and Accuracy of Multi-Layer Perceptrons (MLPs) under Reduced-voltage FPGA BRAMs Operation.
CoRR, 2020

On the Resilience of Deep Learning for Reduced-voltage FPGAs.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration.
Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2020



Checkpoint Restart Support for Heterogeneous HPC Applications.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019
LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing.
CoRR, 2019

Hardware Versus Software Fault Injection of Modern Undervolted SRAMs.
CoRR, 2019

TauRieL: Targeting Traveling Salesman Problem with a deep reinforcement learning inspired architecture.
CoRR, 2019

Evaluating Built-In ECC of FPGA On-Chip Memories for the Mitigation of Undervolting Faults.
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Modern Hardware Margins: CPUs, GPUs, FPGAs Recent System-Level Studies.
Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019

A Novel FPGA-Based High Throughput Accelerator For Binary Search Trees.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Ground-Truth Prediction to Accelerate Soft-Error Impact Analysis for Iterative Methods.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

2018
Memory Controller for Vector Processor.
J. Signal Process. Syst., 2018

Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power Fused Multiply-Add.
IEEE Trans. Very Large Scale Integr. Syst., 2018

Exploring the capabilities of support vector machines in detecting silent data corruptions.
Sustain. Comput. Informatics Syst., 2018

A General Guide to Applying Machine Learning to Computer Architecture.
Supercomput. Front. Innov., 2018

Unified fault-tolerance framework for hybrid task-parallel message-passing applications.
Int. J. High Perform. Comput. Appl., 2018

Performance Study of Non-volatile Memories on a High-End Supercomputer.
Proceedings of the High Performance Computing, 2018

Towards Ad Hoc Recovery for Soft Errors.
Proceedings of the IEEE/ACM 8th Workshop on Fault Tolerance for HPC at eXtreme Scale, 2018

Approximating a Multi-Grid Solver.
Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018


Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-Chip Memories.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Characterization of the Impact of Soft Errors on Iterative Methods.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

A Demo of FPGA Aggressive Voltage Downscaling: Power and Reliability Tradeoffs.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

Fault Characterization Through FPGA Undervolting.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

Comparative analysis of soft-error detection strategies: a case study with iterative methods.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018


2017
An Integrated Vector-Scalar Design on an In-Order ARM Core.
ACM Trans. Archit. Code Optim., 2017

Determinism at Standard-Library Level in TM-Based Applications.
Int. J. Parallel Program., 2017

Automatic Risk-based Selective Redundancy for Fault-tolerant Task-parallel HPC Applications.
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017

A Machine Learning Approach for Performance Prediction and Scheduling on Heterogeneous CPUs.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

RETHINK big: European roadmap for hardware anc networking optimizations for big data.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

MACORD: Online Adaptive Machine Learning Framework for Silent Error Detection.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Designing and Modelling Selective Replication for Fault-tolerant HPC Applications.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

A Deep Learning Mapper (DLM) for Scheduling on Heterogeneous Systems.
Proceedings of the High Performance Computing - 4th Latin American Conference, 2017

2016
Range Translations for Fast Virtual Memory.
IEEE Micro, 2016

Architectural support for efficient message passing on shared memory multi-cores.
J. Parallel Distributed Comput., 2016

Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer.
Proceedings of the International Conference for High Performance Computing, 2016

Implications of non-volatile memory as primary storage for database management systems.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Future Vector Microprocessor Extensions for Data Aggregations.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

CRC-Based Memory Reliability for Task-Parallel HPC Applications.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Energy-efficient address translation.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Energy minimization at all layers of the data center: The ParaDIME project.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

A Runtime Heuristic to Selectively Replicate Tasks for Application-Specific Reliability Targets.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

Towards low-power embedded vector processor.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Accelerating Hash-Based Query Processing Operations on FPGAs by a Hash Table Caching Technique.
Proceedings of the High Performance Computing - Third Latin American Conference, 2016

POSTER: An Integrated Vector-Scalar Design on an In-order ARM Core.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
TRADE: Precise Dynamic Race Detection for Scalable Transactional Memory Systems.
ACM Trans. Parallel Comput., 2015

DaSH: A benchmark suite for hybrid dataflow and shared memory programming models.
Parallel Comput., 2015

Reimagining Heterogeneous Computing: A Functional Instruction-Set Architecture Computing Model.
IEEE Micro, 2015

Kernel-to-User-Mode Transition-Aware Hardware Scheduling.
IEEE Micro, 2015

ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy for data centers.
Microprocess. Microsystems, 2015

Thread Lock Section-Aware Scheduling on Asymmetric Single-ISA Multi-Core.
IEEE Comput. Archit. Lett., 2015

Chapter One - An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques.
Adv. Comput., 2015

FAcET: Fast and accurate power/energy estimation tool for CPU-GPU platforms at architectural-level.
Proceedings of the 28th IEEE International System-on-Chip Conference, 2015

Performance and Energy Efficient Hardware-Based Scheduler for Symmetric/Asymmetric CMPs.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Tidy Cache: Improving Data Placement in Die-Stacked DRAM Caches.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Imposing coarse-grained reconfiguration to general purpose processors.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

NanoCheckpoints: A Task-Based Asynchronous Dataflow Framework for Efficient and Scalable Checkpoint/Restart.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

NEMsCAM: A novel CAM cell based on nano-electro-mechanical switch and CMOS for energy efficient TLBs.
Proceedings of the 2015 IEEE/ACM International Symposium on Nanoscale Architectures, 2015

Joint Circuit-System Design Space Exploration of Multiplier Unit Structure for Energy-Efficient Vector Processors.
Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015

JSRAM: A Circuit-Level Technique for Trading-Off Robustness and Capacity in Cache Memories.
Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015

Redundant memory mappings for fast access to large memories.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

DiMP: Architectural Support for Direct Message Passing on Shared Memory Multi-cores.
Proceedings of the 44th International Conference on Parallel Processing, 2015

VPM: Virtual power meter tool for low-power many-core/heterogeneous data center prototypes.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Marriage Between Coordinated and Uncoordinated Checkpointing for the Exascale Era.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Trigeneous Platforms for Energy Efficient Computing of HPC Applications.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

High Level Synthesis Based Hardware Accelerator Design for Processing SQL Queries.
Proceedings of the 12th FPGAworld Conference 2015, 2015

Accelerating Complete Decision Support Queries Through High-Level Synthesis Technology (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Heterogeneous Platform to Accelerate Compute Intensive Applications.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

High-Level Debugging and Verification for FPGA-Based Multicore Architectures.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Hardware Round-Robin Scheduler for Single-ISA Asymmetric Multi-core.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015


Transactional Memory for Reliability.
Proceedings of the Transactional Memory. Foundations, Algorithms, Tools, and Applications, 2015

Verification Tools for Transactional Programs.
Proceedings of the Transactional Memory. Foundations, Algorithms, Tools, and Applications, 2015

An energy efficient hybrid FPGA-GPU based embedded platform to accelerate face recognition application.
Proceedings of the 2015 IEEE Symposium in Low-Power and High-Speed Chips, 2015

Fault-Tolerant Protocol for Hybrid Task-Parallel Message-Passing Applications.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Programmer-directed partial redundancy for resilient HPC.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

2014
Exploiting Existing Comparators for Fine-Grained Low-Cost Error Detection.
ACM Trans. Archit. Code Optim., 2014

Bit Impact Factor: Towards making fair vulnerability comparison.
Microprocess. Microsystems, 2014

Using Dynamic Runtime Testing for Rapid Development of Architectural Simulators.
Int. J. Parallel Program., 2014

DESSERT: DESign Space ExploRation Tool based on power and energy at System-Level.
Proceedings of the 27th IEEE International System-on-Chip Conference, 2014

Neighbor-cell assisted error correction for MLC NAND flash memories.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

DeTrans: Deterministic and Parallel execution of Transactions.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

A Case Study of Hybrid Dataflow and Shared-Memory Programming Models: Dependency-Based Parallel Game Engine.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Dynamic-vector execution on a general purpose EDGE chip multiprocessor.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Dynamic Verification for Hybrid Concurrent Programming Models.
Proceedings of the Runtime Verification - 5th International Conference, 2014

PAMS: Pattern Aware Memory System for embedded systems.
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

System-level power estimation tool for embedded processor based platforms.
Proceedings of the 2014 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2014

Combining Error Detection and Transactional Memory for Energy-Efficient Computing below Safe Operation Margins.
Proceedings of the 22nd Euromicro International Conference on Parallel, 2014

VPPET: Virtual platform power and energy estimation tool for heterogeneous MPSoC based FPGA platforms.
Proceedings of the 24th International Workshop on Power and Timing Modeling, 2014

System-Level Power and Energy Estimation Methodology for Open Multimedia Applications Platforms.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

Physical vs. Physically-Aware Estimation Flow: Case Study of Design Space Exploration of Adders.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

PETS: Power and energy estimation tool at system-level.
Proceedings of the Fifteenth International Symposium on Quality Electronic Design, 2014

Exploiting a fast and simple ECC for scaling supply voltage in level-1 caches.
Proceedings of the 2014 IEEE 20th International On-Line Testing Symposium, 2014

Performance analysis of the memory management unit under scale-out workloads.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

A dynamically reconfigurable architecture for emergency and disaster management in ITS.
Proceedings of the International Conference on Connected Vehicles and Expo, 2014

Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Advanced Pattern based Memory Controller for FPGA based HPC applications.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

AMMC: Advanced Multi-Core Memory Controller.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

MAPC: Memory access pattern based controller.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Power estimation tool for system on programmable chip based platforms (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

APMC: advanced pattern based memory controller (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

T-Rex: a dynamic race detection tool for C/C++ transactional memory applications.
Proceedings of the Ninth Eurosys Conference 2014, 2014

System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms.
Proceedings of the 12th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2014

ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

EVX: Vector execution on low power EDGE cores.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Dynamic transaction coalescing.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

VALib and SimpleVector: tools for rapid initial research on vector architectures.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

DaSH: a benchmark suite for hybrid dataflow and shared memory programming models: with comparative evaluation of three hybrid dataflow models.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

Flexicache: Highly Reliable and Low Power Cache under Supply Voltage Scaling.
Proceedings of the High Performance Computing - First HPCLATAM, 2014

PVMC: Programmable Vector Memory Controller.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

Stand-Alone Memory Controller for Graphics System.
Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014

2013
Profile-guided transaction coalescing - lowering transactional overheads by merging transactions.
ACM Trans. Archit. Code Optim., 2013

Techniques to improve performance in requester-wins hardware transactional memory.
ACM Trans. Archit. Code Optim., 2013

On the selection of adder unit in energy efficient vector processing.
Proceedings of the International Symposium on Quality Electronic Design, 2013

TM-dietlibc: A TM-aware Real-World System Library.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

EcoTM: Conflict-Aware Economical Unbounded Hardware Transactional Memory.
Proceedings of the International Conference on Computational Science, 2013

HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Circuit design of a novel adaptable and reliable L1 data cache.
Proceedings of the Great Lakes Symposium on VLSI 2013 (part of ECRC), 2013

FaulTM: error detection and recovery using hardware transactional memory.
Proceedings of the Design, Automation and Test in Europe, 2013

Improving the energy efficiency of hardware-assisted watchpoint systems.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Fault tolerance for multi-threaded applications by leveraging hardware transactional memory.
Proceedings of the Computing Frontiers Conference, 2013

2012
Hardware transactional memory with software-defined conflicts.
ACM Trans. Archit. Code Optim., 2012

Resource-bounded multicore emulation using Beefarm.
Microprocess. Microsystems, 2012

Circuit design of a dual-versioning L1 data cache.
Integr., 2012

Profiling and Optimizing Transactional Memory Applications.
Int. J. Parallel Program., 2012

Integrating Dataflow Abstractions into the Shared Memory Model.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

PaRV: Parallelizing Runtime Detection and Prevention of Concurrency Errors.
Proceedings of the Runtime Verification, Third International Conference, 2012

Novel SRAM bias control circuits for a low power L1 data cache.
Proceedings of the NORCHIP 2012, Copenhagen, Denmark, November 12-13, 2012, 2012

Vector Extensions for Decision Support DBMS Acceleration.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Enhancing the performance of assisted execution runtime systems through hardware/software techniques.
Proceedings of the International Conference on Supercomputing, 2012

Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

A Low-Overhead Profiling and Visualization Framework for Hybrid Transactional Memory.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

TagTM - accelerating STMs with hardware tags for fast meta-data access.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Transactional prefetching: narrowing the window of contention in hardware transactional memory.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Supporting stateful tasks in a dataflow graph.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
RMS-TM: a comprehensive benchmark suite for transactional memory systems (abstracts only).
SIGMETRICS Perform. Evaluation Rev., 2011

Hybrid Transactional Memory with Pessimistic Concurrency Control.
Int. J. Parallel Program., 2011

RMS-TM: a comprehensive benchmark suite for transactional memory systems.
Proceedings of the ICPE'11, 2011

Rapid Development of Error-Free Architectural Simulators Using Dynamic Runtime Testing.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

FIMSIM: A fault injection infrastructure for microarchitectural simulators.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Circuit design of a dual-versioning L1 data cache for optimistic concurrency.
Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011

TMbox: A Flexible and Reconfigurable 16-Core Hybrid Transactional Memory System.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

From Plasma to BeeFarm: Design Experience of an FPGA-Based Multicore Prototype.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2011

SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

STM2: A Parallel STM for High Performance Simultaneous Multithreading Systems.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional Memory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
The Velox Transactional Memory Stack.
IEEE Micro, 2010

Debugging programs that use atomic blocks and transactional memory.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Architectural Support for Fair Reader-Writer Locking.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Dynamic filtering: multi-purpose architecture support for language runtime systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Exploiting Inactive Rename Slots for Detecting Soft Errors.
Proceedings of the Architecture of Computing Systems, 2010

Discovering and understanding performance bottlenecks in transactional applications.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Reducing Soft Errors through Operand Width Aware Policies.
IEEE Trans. Dependable Secur. Comput., 2009

Atomic quake: using transactional memory in an interactive multiplayer game server.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Turbocharging boosted transactions or: how i learnt to stop worrying and love longer transactions.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

EazyHTM: eager-lazy hardware transactional memory.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Taking the heat off transactions: Dynamic selection of pessimistic concurrency control.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Clock gate on abort: Towards energy-efficient hardware Transactional Memory.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

QuakeTM: parallelizing a complex sequential application using transactional memory.
Proceedings of the 23rd international conference on Supercomputing, 2009

Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

2008
Refueling: Preventing Wire Degradation due to Electromigration.
IEEE Micro, 2008

Nebelung: Execution Environment for Transactional OpenMP.
Int. J. Parallel Program., 2008

WormBench: a configurable workload for evaluating transactional memory systems.
Proceedings of the 9th workshop on MEmory performance, 2008

The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Transactional Memory: An Overview.
IEEE Micro, 2007

unreadTVar: Extending Haskell Software Transactional Memory for Performance.
Proceedings of the Eighth Symposium on Trends in Functional Programming, 2007

Multithreaded software transactional memory and OpenMP.
Proceedings of the 2007 workshop on MEmory performance, 2007

Transactional Memory and OpenMP.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

Fuse: A Technique to Anticipate Failures due to Degradation in ALUs.
Proceedings of the 13th IEEE International On-Line Testing Symposium (IOLTS 2007), 2007

Hardware Transactional Memory with Operating System Support, HTMOS.
Proceedings of the Euro-Par 2007 Workshops: Parallel Processing, 2007

2006
Impact of Parameter Variations on Circuits and Microarchitecture.
IEEE Micro, 2006

Exploiting Narrow Values for Soft Error Tolerance.
IEEE Comput. Archit. Lett., 2006

Empowering a helper cluster through data-width aware instruction selection policies.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2004
Combining compiler and runtime IPC predictions to reduce energy in next generation architectures.
Proceedings of the First Conference on Computing Frontiers, 2004

Cool-Fetch: A Compiler-Enabled IPC Estimation Based Framework for Energy Reduction.
Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004

2003
Cool-Cache: A compiler-enabled energy efficient data caching framework for embedded/multimedia processors.
ACM Trans. Embed. Comput. Syst., 2003

System-level power-aware design techniques in real-time systems.
Proc. IEEE, 2003

2002
Cool-Fetch: Compiler-Enabled Power-Aware Fetch Throttling.
IEEE Comput. Archit. Lett., 2002

Towards energy-aware software-based fault tolerance in real-time systems.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

The Minimax Cache: An Energy-Efficient Framework for Media Processors.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

2001
Cool-cache for hot multimedia.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

2000
Power-Aware Replication of Data Structures in Distributed Embedded Real-Time Systems.
Proceedings of the Parallel and Distributed Processing, 2000


  Loading...