Amir Yazdanbakhsh

Orcid: 0000-0001-8199-7671

Affiliations:
  • Google


According to our database1, Amir Yazdanbakhsh authored at least 67 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
TAO: Re-Thinking DL-based Microarchitecture Simulation.
Proc. ACM Meas. Anal. Comput. Syst., 2024

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization.
CoRR, 2024

Effective Interplay between Sparsity and Quantization: From Theory to Practice.
CoRR, 2024

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs.
CoRR, 2024

Progressive Gradient Flow for Robust N: M Sparsity Training in Transformers.
CoRR, 2024

Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference.
IEEE Comput. Archit. Lett., 2024

DACAPO: Accelerating Continuous Learning in Autonomous Systems for Video Analytics.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Learning Performance-Improving Code Edits.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

In-Storage Domain-Specific Acceleration for Serverless Computing.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Tandem Processor: Grappling with Emerging Operators in Neural Networks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
JaxPruner: A concise library for sparsity research.
CoRR, 2023

Self-Refine: Iterative Refinement with Self-Feedback.
CoRR, 2023

Domain-Specific Computational Storage for Serverless Computing.
CoRR, 2023

Learning Performance-Improving Code Edits.
CoRR, 2023

Self-Refine: Iterative Refinement with Self-Feedback.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MESA: Microarchitecture Extensions for Spatial Architecture Generation.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

STEP: Learning N: M Structured Sparsity Masks from Scratch with Precondition.
Proceedings of the International Conference on Machine Learning, 2023

What Makes Chain-of-Thought Prompting Effective? A Counterfactual Study.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Architecture 2.0: Challenges and Opportunities.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango.
CoRR, 2022

Training Recipe for N: M Structured Sparsity with Decaying Pruning Mask.
CoRR, 2022

Towards the Co-design of Neural Networks and Accelerators.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Accelerating attention through gradient-based learned runtime pruning.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Data-Driven Offline Optimization for Architecting Hardware Accelerators.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Rethinking Co-design of Neural Architectures and Hardware Accelerators.
CoRR, 2021

Apollo: Transferable Architecture Exploration.
CoRR, 2021

2020
ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks.
IEEE Micro, 2020

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation.
Proceedings of the 8th International Conference on Learning Representations, 2020

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic.
CoRR, 2019

AxMemo: hardware-compiler co-design for approximate code memoization.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Towards Breaking the Memory Bandwidth Wall Using Approximate Value Prediction.
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018
Neuro-general computing an acceleration-approximation approach.
PhD thesis, 2018

SiMul: An Algorithm-Driven Approximate Multiplier Design for Machine Learning.
IEEE Micro, 2018

ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks.
CoRR, 2018

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks.
CoRR, 2018

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

In-DRAM near-data approximate acceleration for GPUs.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
AxBench: A Multiplatform Benchmark Suite for Approximate Computing.
IEEE Des. Test, 2017

2016
RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads.
ACM Trans. Archit. Code Optim., 2016

Mitigating the Memory Bottleneck With Approximate Load Value Prediction.
IEEE Des. Test, 2016

Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

TABLA: A unified template-based framework for accelerating statistical machine learning.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Grater: An approximation workflow for exploiting data-level parallelism in FPGA acceleration.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

2015
Comprehensive Circuit Failure Prediction for Logic and SRAM Using Virtual Aging.
IEEE Micro, 2015

Axilog: Abstractions for Approximate Hardware Design and Reuse.
IEEE Micro, 2015

Neural acceleration for GPU throughput processors.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors.
Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

Axilog: language support for approximate hardware design.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2014
Customized pipeline and instruction set architecture for embedded processing engines.
J. Supercomput., 2014

Implementation-aware selection of the custom instruction set for extensible processors.
Microprocess. Microsystems, 2014

General-purpose code acceleration with limited-precision analog computation.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Rollback-free value prediction with approximate loads.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
A new merit function for custom instruction selection under an area budget constraint.
Des. Autom. Embed. Syst., 2013

2012
Instruction set architectural guidelines for embedded packet-processing engines.
J. Syst. Archit., 2012

2011
Dynamic Soft Error Hardening via Joint Body Biasing and Dynamic Voltage Scaling.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

2010
Energy-aware design space exploration of registerfile for extensible processors.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Instruction reliability analysis for embedded processors.
Proceedings of the 13th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems, 2010


  Loading...