Yida Wang

Orcid: 0000-0001-8165-840X

Affiliations:

Amazon Web Services, Inc., East Palo Alto, CA, USA
Intel Corporation, Parallel Computing Lab, Santa Clara, CA, USA
Princeton University, Department of Computer Science, NJ, USA

According to our database¹, Yida Wang authored at least 38 papers between 2015 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., March, 2024

DISTMM: Accelerating Distributed Multimodal Model Training.

[BibT_eX]

[DOI]

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Inference Optimization of Foundation Models on AI Accelerators.

[BibT_eX]

[DOI]

Matthäus Kleindessner

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines.

[BibT_eX]

[DOI]

Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Distributed Training of Large Language Models on AWS Trainium.

[BibT_eX]

[DOI]

Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

HLAT: High-quality Large Language Model Pre-trained on AWS Trainium.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2024

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Perception and memory retrieval states are reflected in distributed patterns of background functional connectivity.

[BibT_eX]

[DOI]

Y. Peeta Li

Yida Wang

Nicholas B. Turk-Browne

Brice A. Kuhl

J. Benjamin Hutchinson

NeuroImage, August, 2023

RAF: Holistic Compilation for Deep Learning Model Training.

[BibT_eX]

[DOI]

CoRR, 2023

Decoupled Model Schedule for Deep Learning Training.

[BibT_eX]

[DOI]

CoRR, 2023

GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints.

[BibT_eX]

[DOI]

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Training Large-scale Foundation Models on Emerging AI Chips.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

DietCode: Automatic Optimization for Dynamic Tensor Programs.

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021

Bring Your Own Codegen to Deep Learning Compiler.

[BibT_eX]

[DOI]

CoRR, 2021

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Lorien: Efficient Deep Learning Workloads Delivery.

[BibT_eX]

[DOI]

Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

UNIT: Unifying Tensorized Instruction Compilation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach.

[BibT_eX]

[DOI]

Animesh Jain

Shoubhik Bhattacharya

Masahiro Masuda

Vin Sharma

Yida Wang

CoRR, 2020

Is Network the Bottleneck of Distributed Training?

[BibT_eX]

[DOI]

Proceedings of the 2020 Workshop on Network Meets AI & ML, 2020

FeatGraph: a flexible and efficient backend for graph neural network systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Ansor: Generating High-Performance Tensor Programs for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

2019

Optimizing CNN Model Inference on CPUs.

[BibT_eX]

[DOI]

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs.

[BibT_eX]

[DOI]

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs.

[BibT_eX]

[DOI]

CoRR, 2018

2017

BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods.

[BibT_eX]

[DOI]

PLoS Comput. Biol., 2017

High-Performance Incremental SVM Learning on Intel<sup>®</sup> Xeon Phi™ Processors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 32nd International Conference, 2017

2016

Large-scale analyses of functional interactions in the human brain

[BibT_eX]

[DOI]

Yida Wang

PhD thesis, 2016

Real-time full correlation matrix analysis of fMRI data.

[BibT_eX]

[DOI]

Nicholas B. Turk-Browne

Theodore L. Willke

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Enabling factor analysis on thousand-subject neuroimaging datasets.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015

Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors.

[BibT_eX]

[DOI]

Nicholas B. Turk-Browne

Theodore L. Willke

Proceedings of the International Conference for High Performance Computing, 2015

Yida Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...