Yida Wang

Orcid: 0000-0001-8165-840X

Affiliations:
  • Amazon Web Services, Inc., East Palo Alto, CA, USA
  • Intel Corporation, Parallel Computing Lab, Santa Clara, CA, USA
  • Princeton University, Department of Computer Science, NJ, USA


According to our database1, Yida Wang authored at least 38 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs.
ACM Trans. Archit. Code Optim., March, 2024

HLAT: High-quality Large Language Model Pre-trained on AWS Trainium.
CoRR, 2024

DISTMM: Accelerating Distributed Multimodal Model Training.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Inference Optimization of Foundation Models on AI Accelerators.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Distributed Training of Large Language Models on AWS Trainium.
Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Perception and memory retrieval states are reflected in distributed patterns of background functional connectivity.
NeuroImage, August, 2023

RAF: Holistic Compilation for Deep Learning Model Training.
CoRR, 2023

Decoupled Model Schedule for Deep Learning Training.
CoRR, 2023

GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Training Large-scale Foundation Models on Emerging AI Chips.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud.
Proc. VLDB Endow., 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

DietCode: Automatic Optimization for Dynamic Tensor Programs.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021
Bring Your Own Codegen to Deep Learning Compiler.
CoRR, 2021

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Lorien: Efficient Deep Learning Workloads Delivery.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

UNIT: Unifying Tensorized Instruction Compilation.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
Efficient Execution of Quantized Deep Learning Models: A Compiler Approach.
CoRR, 2020

Is Network the Bottleneck of Distributed Training?
Proceedings of the 2020 Workshop on Network Meets AI & ML, 2020

FeatGraph: a flexible and efficient backend for graph neural network systems.
Proceedings of the International Conference for High Performance Computing, 2020

Ansor: Generating High-Performance Tensor Programs for Deep Learning.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

2019
Optimizing CNN Model Inference on CPUs.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs.
CoRR, 2018

2017
BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods.
PLoS Comput. Biol., 2017

High-Performance Incremental SVM Learning on Intel<sup>®</sup> Xeon Phi™ Processors.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

2016
Large-scale analyses of functional interactions in the human brain
PhD thesis, 2016

Real-time full correlation matrix analysis of fMRI data.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Enabling factor analysis on thousand-subject neuroimaging datasets.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015
Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors.
Proceedings of the International Conference for High Performance Computing, 2015


  Loading...