Kailash Gopalakrishnan
Orcid: 0000-0002-8952-0875
According to our database1,
Kailash Gopalakrishnan
authored at least 40 papers
between 2008 and 2023.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2023
A Switched-Capacitor Integer Compute Unit with Decoupled Storage and Arithmetic for Cloud AI Inference in 5nm CMOS.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023
2022
OnSRAM: Efficient Inter-Node On-Chip Scratchpad Management in Deep Learning Accelerators.
ACM Trans. Embed. Comput. Syst., November, 2022
A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling.
IEEE J. Solid State Circuits, 2022
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
2021
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
2020
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference.
Proceedings of the IEEE Symposium on VLSI Circuits, 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
2019
IEEE Micro, 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019
Proceedings of the IEEE International Symposium on Workload Characterization, 2019
Proceedings of the 7th International Conference on Learning Representations, 2019
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019
BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel.
Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2019
DLFloat: A 16-b Floating Point Format Designed for Deep Learning Training and Inference.
Proceedings of the 26th IEEE Symposium on Computer Arithmetic, 2019
2018
A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the International Symposium on Low Power Electronics and Design, 2018
Proceedings of the International Symposium on Low Power Electronics and Design, 2018
True Gradient-Based Training of Deep Binary Activated Neural Networks Via Continuous Binarization.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
2017
Proceedings of the 54th Annual Design Automation Conference, 2017
POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017
2016
Energy-Efficient Simultaneous Localization and Mapping via Compounded Approximate Computing.
Proceedings of the 2016 IEEE International Workshop on Signal Processing Systems, 2016
Proceedings of the IEEE International Conference on Rebooting Computing, 2016
2015
Proceedings of the 32nd International Conference on Machine Learning, 2015
2014
2013
ACM J. Emerg. Technol. Comput. Syst., 2013
2008
IBM J. Res. Dev., 2008