Yakun Sophia Shao

Orcid: 0000-0003-1811-5407

According to our database1, Yakun Sophia Shao authored at least 55 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2014
2016
2018
2020
2022
2024
0
5
10
6
4
1
2
3
1
2
1
4
5
1
5
2
7
1
2
1
2
3
1
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
AuRORA: A Full-Stack Solution for Scalable and Virtualized Accelerator Integration.
IEEE Micro, 2024

Design Space Exploration of Embedded SoC Architectures for Real-Time Optimal Control.
CoRR, 2024

Virgo: Cluster-level Matrix Unit Integration in GPUs for Scalability and Energy Efficiency.
CoRR, 2024

LLM-Aided Compilation for Tensor Accelerators.
CoRR, 2024

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets.
CoRR, 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization.
CoRR, 2024

Stellar: An Automated Design Framework for Dense and Sparse Spatial Accelerators.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Design Approach for Die-to-Die Interfaces to Enable Energy-Efficient Chiplet Systems.
Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design, 2024

FireAxe: Partitioned FPGA-Accelerated Simulation of Large-Scale RTL Designs.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Next-Generation Domain-Specific Accelerators: From Hardware to System.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2024

2023
RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation.
Dataset, June, 2023

Guest Editorial Introduction to the Special Issue on the 2022 IEEE International Solid-State Circuits Conference (ISSCC).
IEEE J. Solid State Circuits, 2023

SPEED: Speculative Pipelined Execution for Efficient Decoding.
CoRR, 2023

Code Transpilation for Hardware Accelerators.
CoRR, 2023

Full Stack Optimization of Transformer Inference: a Survey.
CoRR, 2023

AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata.
Brain Informatics, 2022

Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

2021
SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference.
IEEE J. Solid State Circuits, 2021

Simba: scaling deep-learning inference with chiplet-based architecture.
Commun. ACM, 2021

Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Vertically Integrated Computing Labs Using Open-Source Hardware Generators and Cloud-Hosted FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

A 16mm<sup>2</sup> 106.1 GOPS/W Heterogeneous RISC-V Multi-Core Multi-Accelerator SoC in Low-Power 22nm FinFET.
Proceedings of the 47th ESSCIRC 2021, 2021

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020
Commercial Products.
IEEE Micro, 2020

Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs.
IEEE Micro, 2020

A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm.
IEEE J. Solid State Circuits, 2020

Invited: Chipyard - An Integrated SoC Research and Implementation Environment.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

NeuroVectorizer: end-to-end vectorization with deep reinforcement learning.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019
Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures.
CoRR, 2019

A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

SNAP: A 1.67 - 21.55TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Timeloop: A Systematic Approach to DNN Accelerator Evaluation.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.
Proceedings of the International Conference on Computer-Aided Design, 2019

A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Assisting High-Level Synthesis Improve SpMV Benchmark Through Dynamic Dependence Analysis.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

Hardware Acceleration.
IEEE Micro, 2018


2017
Methods and infrastructure in the era of accelerator-centric architectures.
Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems, 2017

Using dynamic dependence analysis to improve the quality of high-level synthesis designs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

2016
Co-designing accelerators and SoC interfaces using gem5-Aladdin.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

2015
Research Infrastructures for Hardware Accelerators
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01750-6, 2015

The Aladdin Approach to Accelerator Design and Modeling.
IEEE Micro, 2015

2014
Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

MachSuite: Benchmarks for accelerator design and customized architectures.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

2013
ISA-independent workload characterization and its implications for specialized architectures.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Energy characterization and instruction-level energy model of Intel's Xeon Phi processor.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Quantifying acceleration: Power/performance trade-offs of application kernels in hardware.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013


  Loading...