Zhao Zhang

Orcid: 0000-0001-5921-0035

Affiliations:
  • University of Texas at Austin, Texas Advanced Computing Center, Austin, TX, USA
  • University of California, Berkeley, AMPLab, CA, USA (former)
  • University of Chicago, Department of Computer Science, Chicago, IL, USA (former)


According to our database1, Zhao Zhang authored at least 43 papers between 2008 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Fine-grained Policy-driven I/O Sharing for Burst Buffers.
Proceedings of the International Conference for High Performance Computing, 2023

Mirage: Towards Low-interruption Services on Batch GPU Clusters with Reinforcement Learning.
Proceedings of the International Conference for High Performance Computing, 2023

2022
Deep Neural Network Training With Distributed K-FAC.
IEEE Trans. Parallel Distributed Syst., 2022

2021
Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific Workloads.
Proceedings of the High Performance Computing - 36th International Conference, 2021

KAISA: an adaptive second-order optimizer framework for deep neural networks.
Proceedings of the International Conference for High Performance Computing, 2021

Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
Kira: Processing Astronomy Imagery Using Big Data Technology.
IEEE Trans. Big Data, 2020

The Limit of the Batch Size.
CoRR, 2020

Convolutional neural network training with distributed K-FAC.
Proceedings of the International Conference for High Performance Computing, 2020

Efficient I/O for Neural Network Training with Compressed Data.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019
Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs.
IEEE Trans. Parallel Distributed Syst., 2019

Aggregating Local Storage for Scalable Deep Learning I/O.
Proceedings of the Third IEEE/ACM Workshop on Deep Learning on Supercomputers, 2019

Quantifying the Impact of Memory Errors in Deep Learning.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018
FanStore: Enabling Efficient and Scalable I/O for Distributed Deep Learning.
CoRR, 2018

ImageNet Training in Minutes.
Proceedings of the 47th International Conference on Parallel Processing, 2018

BeeFlow: A Workflow Management System for In Situ Processing across HPC and Cloud Systems.
Proceedings of the 38th IEEE International Conference on Distributed Computing Systems, 2018

2017
100-epoch ImageNet Training with AlexNet in 24 Minutes.
CoRR, 2017

Diagnosing Machine Learning Pipelines with Fine-grained Lineage.
Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

2016
Application Skeleton: Generating Synthetic Applications for Infrastructure Research.
J. Open Source Softw., 2016

Application skeletons: Construction and use in eScience.
Future Gener. Comput. Syst., 2016

A convergence of key-value storage systems from clouds to supercomputers.
Concurr. Comput. Pract. Exp., 2016

Integrating Abstractions to Enhance the Execution of Distributed Applications.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
Rethinking Data-Intensive Science Using Scalable Analytics Systems.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31, 2015

The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox.
Proceedings of the Seventh Biennial Conference on Innovative Data Systems Research, 2015

Scientific computing meets big data technology: An astronomy use case.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014
Special issue on eScience infrastructure and applications.
Future Gener. Comput. Syst., 2014

Using Application Skeletons to Improve eScience Infrastructure.
Proceedings of the 10th IEEE International Conference on e-Science, 2014

FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

2013
Parallelizing the execution of sequential scripts.
Proceedings of the International Conference for High Performance Computing, 2013

ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

MTC envelope: defining the capability of large scale computers in the context of parallel scripting applications.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

2012
Many-Task Computing and Blue Waters
CoRR, 2012

Design and analysis of data management in scalable parallel scripting.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Job and data clustering for aggregate use of multiple production cyberinfrastructures.
Proceedings of the DIDC'12, 2012

A Workflow-Aware Storage System: An Opportunity Study.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
AME: an anyscale many-task computing engine.
Proceedings of the WORKS'11, 2011

2010
Middleware support for many-task computing.
Clust. Comput., 2010

Scheduling many-task workloads on supercomputers: Dealing with trailing tasks.
Proceedings of the 3rd Workshop on Many-Task Computing on Grids and Supercomputers, 2010

2009
Parallel Scripting for Applications at the Petascale and Beyond.
Computer, 2009

2008
Towards Loosely-Coupled Programming on Petascale Systems
CoRR, 2008

Enabling Loosely-Coupled Serial Job Execution on the IBM BlueGene/P Supercomputer and the SiCortex SC5832
CoRR, 2008

Design and evaluation of a collective IO model for loosely coupled petascale programming.
Proceedings of the 2008 Workshop on Many-Task Computing on Grids and Supercomputers, 2008

Toward loosely coupled programming on petascale systems.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008


  Loading...