Zhen Jia

Orcid: 0000-0003-3543-2324

Affiliations:
  • Princeton University, Computer Science Department, NJ, USA
  • Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China (PhD 2016)


According to our database1, Zhen Jia authored at least 42 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs.
ACM Trans. Archit. Code Optim., March, 2024

DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023
RAF: Holistic Compilation for Deep Learning Model Training.
CoRR, 2023

GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

2021
LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2019
Understanding Processors Design Decisions for Data Analytics in Homogeneous Data Centers.
IEEE Trans. Big Data, 2019

RAGuard: An Efficient and User-Transparent Hardware Mechanism against ROP Attacks.
ACM Trans. Archit. Code Optim., 2019

The anatomy of efficient FFT and winograd convolutions on modern CPUs.
Proceedings of the ACM International Conference on Supercomputing, 2019

2018
FFT Convolutions are Faster than Winograd on Modern CPUs, Here is Why.
CoRR, 2018

Big Data Dwarfs: Towards Fully Understanding Big Data Analytics Workloads.
CoRR, 2018

Optimizing N-dimensional, winograd-based convolution for manycore CPUs.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

CVR: efficient vectorization of SpMV on x86 processors.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Benchmarking SpMV Methods on Many-Core Platforms.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2018

AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2018

2017
Understanding Big Data Analytics Workloads on Modern Processors.
IEEE Trans. Parallel Distributed Syst., 2017

A Dwarf-based Scalable Big Data Benchmarking Methodology.
CoRR, 2017

BigDataBench-S: An Open-Source Scientific Big Data Benchmark Suite.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

CloudMix: Generating Diverse and Reducible Workloads for Cloud Systems.
Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), 2017

2016
Characterization and architectural implications of big data workloads.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Characterizing OS Behaviors of Datacenter and Big Data Workloads.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Understanding Data Analytics Workloads on Intel(R) Xeon Phi(R).
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

BDTUne: Hierarchical correlation-based performance analysis and rule-based diagnosis for big data systems.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Characterization and Architectural Implications of Big Data Workloads.
CoRR, 2015

Understanding Big Data Analytic Workloads on Modern Processors.
CoRR, 2015

Benchmarking Big Data Systems: State-of-the-Art and Future Directions.
CoRR, 2015

Characterizing Data Analytics Workloads on Intel Xeon Phi.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

2014
Understanding the behavior of in-memory computing workloads.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Characterizing and subsetting big data workloads.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

BigDataBench: A big data benchmark suite from internet services.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013
BigDataBench: a Big Data Benchmark Suite from Web Search Engines.
CoRR, 2013

Characterizing data analysis workloads in data centers.
Proceedings of the IEEE International Symposium on Workload Characterization, 2013

CloudRank-V: A Desktop Cloud Benchmark with Complex Workloads.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012
Precise, Scalable, and Online Request Tracing for Multitier Services of Black Boxes.
IEEE Trans. Parallel Distributed Syst., 2012

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications.
Frontiers Comput. Sci., 2012

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems.
Proceedings of the Specifying Big Data Benchmarks, 2012

LogMaster: Mining Event Correlations in Logs of Large-Scale Cluster Systems.
Proceedings of the IEEE 31st Symposium on Reliable Distributed Systems, 2012

High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011
Characterization of real workloads of web search engines.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011


  Loading...