Sheng Li

Orcid: 0000-0003-1068-5261

Affiliations:

Google, Mountain View, CA, USA
Intel Labs, Santa Clara, CA, USA (former)
Hewlett-Packard Labs (former)
University of Notre Dame, IN, USA (former)

According to our database¹, Sheng Li authored at least 36 papers between 2007 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

Reconfigurable Lightwave Fabrics for ML Supercomputers.

[BibT_eX]

[DOI]

Muhammad Mukarram Bin Tariq

Amin Vahdat

Proceedings of the Optical Fiber Communications Conference and Exhibition, 2024

Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs.

[BibT_eX]

[DOI]

Mohamed S. Abdelfattah

Zhiru Zhang

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Experts Weights Averaging: A New General Training Scheme for Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2023

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search.

[BibT_eX]

[DOI]

Mohamed S. Abdelfattah

CoRR, 2023

Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems.

[BibT_eX]

[DOI]

Muhammad Mukarram Bin Tariq

Amin Vahdat

Proceedings of the ACM SIGCOMM 2023 Conference, 2023

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

TripLe: Revisiting Pretrained Model Reuse and Progressive Learning for Efficient Vision Transformer Scaling and Searching.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Hyperscale Hardware Optimized Neural Architecture Search.

[BibT_eX]

[DOI]

Parthasarathy Ranganathan

Norman P. Jouppi

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2021

The Design Process for Google's Training Chips: TPUv2 and TPUv3.

[BibT_eX]

[DOI]

IEEE Micro, 2021

Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

NeuroMeter: An Integrated Power, Area, and Timing Modeling Framework for Machine Learning Accelerators Industry Track Paper.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Searching for Fast Model Families on Datacenter Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

A domain-specific supercomputer for training deep neural networks.

[BibT_eX]

[DOI]

Commun. ACM, 2020

Google's Training Chips Revealed: TPUv2 and TPUv3.

[BibT_eX]

[DOI]

Proceedings of the IEEE Hot Chips 32 Symposium, 2020

2019

Parallelizing Word2Vec in Shared and Distributed Memory.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

2017

Work as a team or individual: Characterizing the system-level impacts of main memory partitioning.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

2016

Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2016

Achieving One Billion Key-Value Requests per Second on a Single Server.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures.

[BibT_eX]

[DOI]

CoRR, 2016

Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications.

[BibT_eX]

[DOI]

Daejin Jung

Sheng Li

Jung Ho Ahn

IEEE Comput. Archit. Lett., 2016

2015

Buri: Scaling Big-Memory Computing with Hardware-Based Memory Expansion.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2015

Architecting to achieve a billion requests per second throughput on a single key-value store server platform.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

History-Assisted Adaptive-Granularity Caches (HAAG$) for High Performance 3D DRAM Architectures.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2013

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Kiln: closing the performance gap between systems with and without persistence support.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

2012

MAGE: adaptive granularity and ECC for resilient and power efficient memory systems.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory.

[BibT_eX]

[DOI]

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011

Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2011

System implications of memory reliability in exascale computing.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

System-level integrated server architectures for scale-out datacenters.

[BibT_eX]

[DOI]

Parthasarathy Ranganathan

Norman P. Jouppi

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

2009

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

2008

Memory model effects on application performance for a lightweight multithreaded architecture.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Design of a mask-programmable memory/multiplier array using G4-FET technology.

[BibT_eX]

[DOI]

Mohammad M. Mojarradi

Proceedings of the 45th Design Automation Conference, 2008

2007

A Heterogeneous Lightweight Multithreaded Architecture.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Sheng Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...