Jiarui Fang

Orcid: 0000-0002-6724-2763

According to our database1, Jiarui Fang authored at least 31 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
FDI-SIL.
Dataset, March, 2024

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism.
CoRR, 2024

LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism.
CoRR, 2024

PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models.
CoRR, 2024

USP: A Unified Sequence Parallelism Approach for Long Context Generative AI.
CoRR, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference.
CoRR, 2024

FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

2023
Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management.
IEEE Trans. Parallel Distributed Syst., 2023

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models.
CoRR, 2023

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

2022
Elixir: Train a Large Language Model on a Small GPU Cluster.
CoRR, 2022

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models.
CoRR, 2022

A Frequency-aware Software Cache for Large Recommendation System Embeddings.
CoRR, 2022

RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management.
CoRR, 2021

TurboTransformers: an efficient GPU serving system for transformer models.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

2020
Efficient AES implementation on Sunway TaihuLight supercomputer: A systematic approach.
J. Parallel Distributed Comput., 2020

2019
Semantic Segmentation-Based Building Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data.
Remote. Sens., 2019

RedSync: Reducing synchronization bandwidth for distributed deep learning training system.
J. Parallel Distributed Comput., 2019

swATOP: Automatically Optimizing Deep Learning Operators on SW26010 Many-Core Processor.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer.
ACM Trans. Archit. Code Optim., 2018

A dynamic agricultural prediction system for large-scale drought assessment on the Sunway TaihuLight supercomputer.
Comput. Electron. Agric., 2018

Semantic Segmentation Based Building Extraction Method Using Multi-Source GIS Map Datasets and Satellite Imagery.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017
Parallel Multiclass Support Vector Machine for Remote Sensing Data Classification on Multicore and Many-Core Architectures.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2017

SW-AES: Accelerating AES Algorithm on the Sunway TaihuLight.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016
Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer.
Proceedings of the International Conference for High Performance Computing, 2016

Cache-Friendly Design for Complex Spatially-Variable Coefficient Stencils on Many-Core Architectures.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

2015
Optimizing Complex Spatially-Variant Coefficient Stencils for Seismic Modeling on GPU.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

2013
Sourcing strategies in supply risk management: An approximate dynamic programming approach.
Comput. Oper. Res., 2013


  Loading...