Vijayalakshmi Srinivasan

Orcid: 0000-0002-2593-8789

According to our database1, Vijayalakshmi Srinivasan authored at least 67 papers between 1997 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024

2022
OnSRAM: Efficient Inter-Node On-Chip Scratchpad Management in Deep Learning Accelerators.
ACM Trans. Embed. Comput. Syst., November, 2022

A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling.
IEEE J. Solid State Circuits, 2022

Deep Compression of Pre-trained Transformer Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021


2020
Efficient AI System Design With Cross-Layer Approximate Computing.
Proc. IEEE, 2020


Ultra-Low Precision 4-bit Training of Deep Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019
DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI Accelerator.
IEEE Micro, 2019

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Accurate and Efficient 2-bit Quantized Neural Networks.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Workload-aware Automatic Parallelization for Multi-GPU DNN Training.
Proceedings of the IEEE International Conference on Acoustics, 2019

Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel.
Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2019

2018
Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN).
CoRR, 2018

PACT: Parameterized Clipping Activation for Quantized Neural Networks.
CoRR, 2018


Taming the beast: Programming Peta-FLOP class Deep Learning Systems.
Proceedings of the International Symposium on Low Power Electronics and Design, 2018


Exploiting approximate computing for deep learning acceleration.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Special Issue on Network and Parallel Computing.
Int. J. Parallel Program., 2017

Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Accelerator Design for Deep Learning Training: Extended Abstract: Invited.
Proceedings of the 54th Annual Design Automation Conference, 2017

POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Co-designing accelerators and SoC interfaces using gem5-Aladdin.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Peruse and Profit: Estimating the Accelerability of Loops.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Approximate computing: Challenges and opportunities.
Proceedings of the IEEE International Conference on Rebooting Computing, 2016

2015
Self-contained, accurate precomputation prefetching.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

DASX: Hardware Accelerator for Software Data Structures.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2014
Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads.
IEEE Micro, 2014

NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

SQRL: hardware accelerator for collecting software data structures.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
RECAP: A region-based cure for the common cold (cache).
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012
Programming with relaxed synchronization.
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability, 2012

Efficient scrub mechanisms for error-prone emerging memories.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

ReCaP: a region-based cure for the common cold cache.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Big Chips.
IEEE Micro, 2011

SPATL: Honey, I Shrunk the Coherence Directory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
SAFER: Stuck-At-Fault Error Recovery for Memories.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

2009
A tagless coherence directory.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Scalable high performance main memory system using phase-change memory technology.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

2008
Analyzing the Cost of a Cache Miss Using Pipeline Spectroscopy.
J. Instr. Level Parallelism, 2008

On the Nature of Cache Miss Behavior: Is It √2?
J. Instr. Level Parallelism, 2008

2007
Pipeline spectroscopy.
Proceedings of the Workshop on Experimental Computer Science, 2007

An analysis of the effects of miss clustering on the cost of a cache miss.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
Cache miss behavior: is it sqrt(2)?
Proceedings of the Third Conference on Computing Frontiers, 2006

2005
Exploring the limits of prefetching.
IBM J. Res. Dev., 2005

When prefetching improves/degrades performance.
Proceedings of the Second Conference on Computing Frontiers, 2005

2004
Integrated Analysis of Power and Performance for Pipelined Microprocessors.
IEEE Trans. Computers, 2004

A Prefetch Taxonomy.
IEEE Trans. Computers, 2004

Microarchitectural techniques for power gating of execution units.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

2003
New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors.
IBM J. Res. Dev., 2003

Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches.
Proceedings of the Power-Aware Computer Systems, Third International Workshop, 2003

2002
Early-Stage Definition of LPX: A Low Power Issue-Execute Processor.
Proceedings of the Power-Aware Computer Systems, Second International Workshop, 2002

Optimizing pipelines for power and performance.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

2001
Hardware solutions to reduce effective memory access time.
PhD thesis, 2001

Branch History Guided Instruction Prefetching.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

1999
Active Management of Data Caches by Exploiting Reuse Information.
IEEE Trans. Computers, 1999

1998
Evaluating the performance of active cache management schemes.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

1997
Towards a Communication Characterization Methodology for Parallel Applications.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997


  Loading...