Angshuman Parashar

Orcid: 0000-0001-9936-6501

According to our database1, Angshuman Parashar authored at least 30 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

2023
HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

LoopTree: Enabling Exploration of Fused-layer Dataflow Accelerators.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

2022
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators.
ACM Trans. Archit. Code Optim., 2022

A Formalism of DNN Accelerator Flexibility.
Proc. ACM Meas. Anal. Comput. Syst., 2022

Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity.
CoRR, 2022

Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

Demystifying Map Space Exploration for NPUs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2021
Flexion: A Quantitative Metric for Flexibility in DNN Accelerators.
IEEE Comput. Archit. Lett., 2021

Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Mind mappings: enabling efficient algorithm-accelerator mapping space search.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020
Data Orchestration in Deep Learning Accelerators
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01767-4, 2020

MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings.
IEEE Micro, 2020

2019
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Timeloop: A Systematic Approach to DNN Accelerator Evaluation.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

2017
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2015
Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures.
ACM Trans. Comput. Syst., 2015

2014
Efficient Spatial Processing Element Control via Triggered Instructions.
IEEE Micro, 2014

2013
Triggered instructions: a control paradigm for spatially-programmed architectures.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

A Hierarchical Architectural Framework for Reconfigurable Logic Computing.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

2012
Leveraging latency-insensitivity to ease multiple FPGA design.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

2011
HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Leap scratchpads: automatic memory and cache management for reconfigurable logic.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

2007
Mechanisms for bounding vulnerabilities of processor structures.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

2006
SlicK: slice-based locality exploitation for efficient redundant multithreading.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2004
An uncalibrated lightfield acquisition system.
Image Vis. Comput., 2004

A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004


  Loading...