Tushar Krishna
Orcid: 0000-0001-5738-6942
According to our database1,
Tushar Krishna
authored at least 173 papers
between 2008 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture.
CoRR, 2024
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs.
CoRR, 2024
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models.
CoRR, 2024
PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects.
CoRR, 2024
Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption.
CoRR, 2024
CoRR, 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Accurate Low-Degree Polynomial Approximation of Non-Polynomial Operators for Fast Private Inference in Homomorphic Encryption.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
LIBRA: Enabling Workload-Aware Multi-Dimensional Network Topology Optimization for Distributed Training of Large AI Models.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024
FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Proceedings of the IEEE International Symposium on Workload Characterization, 2024
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-training Quantization of ViTs.
Proceedings of the Computer Vision - ECCV 2024, 2024
H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
Special Session: Neuro-Symbolic Architecture Meets Large Language Models: A Memory-Centric Perspective.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2024
2023
IEEE Des. Test, December, 2023
On Continuing DNN Accelerator Architecture Scaling Using Tightly Coupled Compute-on-Memory 3-D ICs.
IEEE Trans. Very Large Scale Integr. Syst., October, 2023
STIFT: A Spatio-Temporal Integrated Folding Tree for Efficient Reductions in Flexible DNN Accelerators.
ACM J. Emerg. Technol. Comput. Syst., October, 2023
Introduction to the Special Issue on Next-Generation On-Chip and Off-Chip Communication Architectures for Edge, Cloud and HPC.
ACM J. Emerg. Technol. Comput. Syst., October, 2023
TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency.
ACM Trans. Archit. Code Optim., September, 2023
Hardware-Software Co-Design for Real-Time Latency-Accuracy Navigation in Tiny Machine Learning Applications.
IEEE Micro, 2023
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces.
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
IEEE Comput. Archit. Lett., 2023
XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023
SNATCH: Stealing Neural Network Architecture from ML Accelerator in Intelligent Sensors.
Proceedings of the 2023 IEEE SENSORS, Vienna, Austria, October 29 - Nov. 1, 2023, 2023
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
Efficient Distributed Inference of Deep Neural Networks via Restructuring and Pruning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication.
IEEE Trans. Parallel Distributed Syst., 2022
IEEE Trans. Computers, 2022
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators.
ACM Trans. Archit. Code Optim., 2022
Proc. ACM Meas. Anal. Comput. Syst., 2022
COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training.
CoRR, 2022
DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators.
CoRR, 2022
MicroEdge: a multi-tenant edge cluster system architecture for scalable camera processing.
Proceedings of the Middleware '22: 23rd International Middleware Conference, Quebec, QC, Canada, November 7, 2022
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022
Themis: a network bandwidth-aware collective scheduling policy for distributed training of DL models.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the IEEE International Symposium on Workload Characterization, 2022
MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2022
DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
Self adaptive reconfigurable arrays (SARA): learning flexible GEMM accelerator configuration and mapping-space using ML.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
2021
Clock Delivery Network Design and Analysis for Interposer-Based 2.5-D Heterogeneous Systems.
IEEE Trans. Very Large Scale Integr. Syst., 2021
Efficiently Solving Partial Differential Equations in a Partially Reconfigurable Specialized Hardware.
IEEE Trans. Computers, 2021
Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models.
CoRR, 2021
CoRR, 2021
CoRR, 2021
Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration.
CoRR, 2021
STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators.
IEEE Comput. Archit. Lett., 2021
IEEE Comput. Archit. Lett., 2021
Proceedings of the International Conference for High Performance Computing, 2021
A novel network fabric for efficient spatio-temporal reduction in flexible DNN accelerators.
Proceedings of the NOCS '21: International Symposium on Networks-on-Chip, 2021
Proceedings of the NOCS '21: International Symposium on Networks-on-Chip, 2021
Technology-aware Router Architectures for On-Chip-Networks in Heterogeneous Technologies.
Proceedings of the NANOCOM '21: The Eighth Annual ACM International Conference on Nanoscale Computing and Communication, Virtual Event, Italy, September 7, 2021
Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021
E3: A HW/SW Co-design Neuroevolution Platform for Autonomous Learning in Edge Device.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
Bridging the Frequency Gap in Heterogeneous 3D SoCs through Technology-Specific NoC Router Architectures.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021
Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021
2020
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01767-4, 2020
Architecture, Chip, and Package Codesign Flow for Interposer-Based 2.5-D Chiplet Integration Enabling Heterogeneous IP Reuse.
IEEE Trans. Very Large Scale Integr. Syst., 2020
MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings.
IEEE Micro, 2020
Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference.
CoRR, 2020
Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms.
CoRR, 2020
CoRR, 2020
MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators.
CoRR, 2020
Proceedings of the VLSI-SoC: Design Trends, 2020
Proceedings of the 28th IFIP/IEEE International Conference on Very Large Scale Integration, 2020
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020
ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020
CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020
GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2020
Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
Kite: A Family of Heterogeneous Interposer Topologies Enabled via Accurate Interconnect Modeling.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
2019
Synchronized Progress in Interconnection Networks (SPIN): A New Theory for Deadlock Freedom.
IEEE Micro, 2019
Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, 2019
Reinforcement learning based interconnection routing for adaptive traffic optimization.
Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Proceedings of the 12th International Workshop on Network on Chip Architectures, 2019
mRNA: Enabling Efficient Mapping Space Exploration for a Reconfiguration Neural Accelerator.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019
Proceedings of the IEEE International Symposium on Workload Characterization, 2019
Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems, 2019
Scaling the Cascades: Interconnect-Aware FPGA Implementation of Machine Learning Problems.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019
Architecture, Chip, and Package Co-design Flow for 2.5D IC Design Enabling Heterogeneous IP Reuse.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
2018
IEEE Micro, 2018
MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators.
CoRR, 2018
Proceedings of the Twelfth IEEE/ACM International Symposium on Networks-on-Chip, 2018
Proceedings of the Twelfth IEEE/ACM International Symposium on Networks-on-Chip, 2018
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-Cost High-Performance Soft NoCs.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
Proceedings of the 2018 IEEE International Conference on Rebooting Computing, 2018
Proceedings of the 3rd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems, 2018
FastTrack: Exploiting Fast FPGA Wiring for Implementing NoC Shortcuts (Abstract Only).
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018
Optimizing the data placement and transformation for multi-bank CGRA computing system.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
2017
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01755-1, 2017
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.
IEEE J. Solid State Circuits, 2017
Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip, 2017
Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip, 2017
Proceedings of the 10th International Workshop on Network on Chip Architectures, 2017
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017
A case for low frequency single cycle multi hop NoCs for energy efficiency and high performance.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
Automatic place-and-route of emerging LED-driven wires within a monolithically-integrated CMOS-III-V process.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017
2016
14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.
Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016
2015
Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures.
ACM Trans. Comput. Syst., 2015
2014
PhD thesis, 2014
IEEE Micro, 2014
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014
SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
SCORPIO: 36-core shared memory processor demonstrating snoopy coherence on a mesh interconnect.
Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014
2013
SWIFT: A Low-Power Network-On-Chip Implementing the Token Flow Control Router Architecture With Swing-Reduced Interconnects.
IEEE Trans. Very Large Scale Integr. Syst., 2013
Single-Cycle Multihop Asynchronous Repeated Traversal: A SMART Future for Reconfigurable On-Chip Networks.
Computer, 2013
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
Proceedings of the Design, Automation and Test in Europe, 2013
2012
Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012
2011
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011
2010
Physical vs. Virtual Express Topologies with Low-Swing Links for Future Many-Core NoCs.
Proceedings of the NOCS 2010, 2010
Proceedings of the 28th International Conference on Computer Design, 2010
2009
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009
2008
Texture filter memory: a power-efficient and scalable texture memory architecture for mobile graphics processors.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008