Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

Aware: Adaptive Distributed Training with Computation, Communication and Position Awareness for Deep Learning Model.

[BibT_eX]

[DOI]

Yan Zeng

LBBGEMM: A Load-balanced Batch GEMM Framework on ARM CPU s.

[BibT_eX]

[DOI]

EgpuIP: An Embedded GPU Accelerated Library for Image Processing.

[BibT_eX]

[DOI]

2021

Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Efficient parallel linear scaling method to get the response density matrix in all-electron real-space density-functional perturbation theory.

[BibT_eX]

[DOI]

Comput. Phys. Commun., 2021

Many-core acceleration of the first-principles all-electron quantum perturbation calculations.

[BibT_eX]

[DOI]

Comput. Phys. Commun., 2021

Enhanced AGCM3D: A Highly Scalable Dynamical Core of Atmospheric General Circulation Model Based on Leap-Format.

[BibT_eX]

[DOI]

CoRR, 2021

Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations.

[BibT_eX]

[DOI]

CoRR, 2021

AutoFlow: Hotspot-Aware, Dynamic Load Balancing for Distributed Stream Processing.

[BibT_eX]

[DOI]

CoRR, 2021

An Efficient Vectorization Scheme for Stencil Computation.

[BibT_eX]

[DOI]

CoRR, 2021

AIPerf: Automated machine learning as an AI-HPC benchmark.

[BibT_eX]

[DOI]

Big Data Min. Anal., 2021

Temporal vectorization for stencils.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

Extreme-scale <i>ab initio</i> quantum raman spectra simulations on the leadership HPC system in China.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

Accelerating all-electron <i>ab initio</i> simulation of raman spectra for biological systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

TensorKMC: kinetic Monte Carlo simulation of 50 trillion atoms driven by deep learning on a new generation of Sunway supercomputer.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

Reducing redundancy in data organization and arithmetic calculation for stencil computations.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Conference on Parallel and Distributed Systems, 2021

AutoFlow: Hotspot-Aware, Dynamic Load Balancing for Distributed Stream Processing.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

A Transpose-free Three-dimensional FFT Algorithm on ARM CPUs.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

2020

Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations.

[BibT_eX]

[DOI]

J. Supercomput., 2020

并行程序设计语言中局部性机制的研究 (Research on Locality-aware Design Mechanism of State-of-the-art Parallel Programming Languages).

[BibT_eX]

[DOI]

计算机科学, 2020

WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system.

[BibT_eX]

[DOI]

Daning Cheng

Shigang Li

Yunquan Zhang

J. Parallel Distributed Comput., 2020

The static parallel distribution algorithms for hybrid density-functional calculations in HONPAS package.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2020

HPC software capability landscape in China.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2020

Accelerated LiDAR data processing algorithm for self-driving cars on the heterogeneous computing platform.

[BibT_eX]

[DOI]

IET Comput. Digit. Tech., 2020

The dynamic parallel distribution algorithm for hybrid density-functional calculations in HONPAS package.

[BibT_eX]

[DOI]

Comput. Phys. Commun., 2020

A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Performance Optimization for Feature Extraction Section of DeepChem.

[BibT_eX]

[DOI]

Ke Zhan

Zhonghua Lu

Yunquan Zhang

Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

2019

Correction to: FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations.

[BibT_eX]

[DOI]

J. Supercomput., 2019

A Relational Theory of Locality.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2019

2018年中国高性能计算机发展现状分析与展望 (State-of-the-art Analysis and Perspectives of 2018 China HPC Development).

[BibT_eX]

[DOI]

Yunquan Zhang

计算机科学, 2019

Efficient parallel optimizations of a high-performance SIFT on GPUs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2019

Mining concise patterns on graph-connected itemsets.

[BibT_eX]

[DOI]

Neurocomputing, 2019

The Scalability for Parallel Machine Learning Training Algorithm: Dataset Matters.

[BibT_eX]

[DOI]

CoRR, 2019

HPC AI500: A Benchmark Suite for HPC AI Systems.

[BibT_eX]

[DOI]

CoRR, 2019

OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

AutoFFT: a template-based FFT codes auto-generation framework for ARM and X86 CPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019

Using Gradient Based Multikernel Gaussian Process and Meta-Acquisition Function to Accelerate SMBO.

[BibT_eX]

[DOI]

Proceedings of the 31st IEEE International Conference on Tools with Artificial Intelligence, 2019

Tessellating Star Stencils.

[BibT_eX]

[DOI]

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Cache-Oblivious MPI All-to-All Communications Based on Morton Order.

[BibT_eX]

[DOI]

Shigang Li

Yunquan Zhang

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2018

Using Known Information to Accelerate HyperParameters Optimization Based on SMBO.

[BibT_eX]

[DOI]

CoRR, 2018

Asynchronous Parallel Sampling Gradient Boosting Decision Tree.

[BibT_eX]

[DOI]

CoRR, 2018

A Measurement Theory of Locality.

[BibT_eX]

[DOI]

CoRR, 2018

Rolling Forecasting Forward by Boosting Heterogeneous Kernels.

[BibT_eX]

[DOI]

Proceedings of the Advances in Knowledge Discovery and Data Mining, 2018

Footmark: A New Formulation for Working Set Statistics.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2018

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model Based on 3D Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2018

HPC AI500: A Benchmark Suite for HPC AI Systems.

[BibT_eX]

[DOI]

Proceedings of the Benchmarking, Measuring, and Optimizing, 2018

2017

Special Issue on Network and Parallel Computing.

[BibT_eX]

[DOI]

Vijayalakshmi Srinivasan

Yunquan Zhang

Int. J. Parallel Program., 2017

Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation.

[BibT_eX]

[DOI]

Comput. Phys. Commun., 2017

Asynchronous COMID: the theoretic basis for transmitted data sparsification tricks on Parameter Server.

[BibT_eX]

[DOI]

Daning Cheng

Shigang Li

Yunquan Zhang

CoRR, 2017

Weighted parallel SGD for distributed unbalanced-workload training system.

[BibT_eX]

[DOI]

Daning Cheng

Shigang Li

Yunquan Zhang

CoRR, 2017

Tessellating stencils.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures.

[BibT_eX]

[DOI]

Shigang Li

Yunquan Zhang

Torsten Hoefler

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

HartSift: A High-Accuracy and Real-Time SIFT Based on GPU.

[BibT_eX]

[DOI]

Zhihao Li

Haipeng Jia

Yunquan Zhang

Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

2016

A Cross-Platform SpMV Framework on Many-Core Architectures.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

Parallel Processing Systems for Big Data: A Survey.

[BibT_eX]

[DOI]

Athanasios V. Vasilakos

Proc. IEEE, 2016

P-DOT: a model of computation for big data.

[BibT_eX]

[DOI]

Int. J. Parallel Emergent Distributed Syst., 2016

边缘海静力数值预报模式并行算法研究 (Parallelization of Hydrostatic Numerical Forecasting Model of Marginal Sea).

[BibT_eX]

[DOI]

计算机科学, 2016

Workshop on high performance data intensive computing.

[BibT_eX]

[DOI]

Yunquan Zhang

Ji-Lin Zhang

Concurr. Comput. Pract. Exp., 2016

Efficient Management for Hybrid Memory in Managed Language Runtime.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2016

2015

基于Pthreads的并行DSRC压缩算法设计与实现 (Design and Implementation of Parallel DSRC Compression Algorithm Based on Pthreads).

[BibT_eX]

[DOI]

计算机科学, 2015

基于Julia语言的并行计算方法初探 (Primary Investigation into Parallel Computing in Julia Language).

[BibT_eX]

[DOI]

计算机科学, 2015

基于OpenCL的直方图生成算法优化方法研究 (Research on Histogram Generation Algorithm Optimization Based on OpenCL).

[BibT_eX]

[DOI]

Xiaojing An

Yunquan Zhang

Haipeng Jia

计算机科学, 2015

Automatic tuning of sparse matrix-vector multiplication on multicore clusters.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2015

AsHES Introduction and Committees.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Optimizing Image Sharpening Algorithm on GPU.

[BibT_eX]

[DOI]

Proceedings of the 44th International Conference on Parallel Processing, 2015

Fast Convolution Operations on Many-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Optimized Password Recovery for Encrypted RAR on GPUs.

[BibT_eX]

[DOI]

Xiaojing An

Haipeng Jia

Yunquan Zhang

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Parallel Solving Method of SOR Based on the Numerical Marine Forecasting Model.

[BibT_eX]

[DOI]

Renbo Pang

Jianliang Xu

Yunquan Zhang

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014

Function Prediction of Proteins in Yeast Networks Based on the MCL Algorithm.

[BibT_eX]

[DOI]

Ke Zhan

Yunquan Zhang

J. Softw., 2014

Memory Efficient Two-Pass 3D FFT Algorithm for Intel® Xeon PhiTM Coprocessor.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2014

yaSpMV: yet another SpMV framework on GPUs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

AsHES Introduction and Committees.

[BibT_eX]

[DOI]

Yunquan Zhang

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Physically based parallel ray tracer for the Metropolis light transport algorithm on the Tianhe-2 supercomputer.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Research on Mahalanobis Distance Algorithm Optimization Based on OpenCL.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

2013

MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2013

AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

StreamScan: fast scan algorithms for GPUs without global barrier synchronization.

[BibT_eX]

[DOI]

Shengen Yan

Guoping Long

Yunquan Zhang

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments.

[BibT_eX]

[DOI]

Proceedings of the IEEE 33rd International Conference on Distributed Computing Systems, 2013

H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS.

[BibT_eX]

[DOI]

Tao Luo

Guoliang Chen

Yunquan Zhang

Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

Large Scale Satellite Imagery Simulations with Physically Based Ray Tracing on Tianhe-1A Supercomputer.

[BibT_eX]

[DOI]

Changmao Wu

Yunquan Zhang

Congli Yang

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

CLSIFT: An Optimization Study of the Scale Invariance Feature Transform on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

2012

Implementing High-performance Intensity Model with Blur Effect on GPUs for Large-scale Star Image Simulation.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Modeling the Locality in Graph Traversals.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Parallel Processing, 2012

Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor.

[BibT_eX]

[DOI]

Xianyi Zhang

Qian Wang

Yunquan Zhang

Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

An Insightful Program Performance Tuning Chain for GPU Computing.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

Accelerating Viola-Jones Facce Detection Algorithm on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

GPURoofline: A Model for Guiding Performance Optimizations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

A Locality-based Performance Model for Load-and-Compute Style Computation.

[BibT_eX]

[DOI]

Liang Yuan

Yunquan Zhang

Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

2011

Optimizing SpMV for Diagonal Sparse Matrices on GPU.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Automatic FFT Performance Tuning on OpenCL GPUs.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

CRSD: Application Specific Auto-tuning of SpMV for Diagonal Sparse Matrices.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010

Perspectives of China's HPC system development: a view from the 2009 China HPC TOP100 list.

[BibT_eX]

[DOI]

Frontiers Comput. Sci. China, 2010

Heterogeneous Multi-core Parallel SGEMM Performance Testing and Analysis on Cell/B.E Processor.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Networking, Architecture, and Storage, 2010

Optimizing Sparse Matrix Vector Multiplication Using Diagonal Storage Matrix Format.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Numerical Simulation of the Thermal Convection in the Earth's Outer Core.

[BibT_eX]

[DOI]

Chao Yang

Yunquan Zhang

Ligang Li

Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

LogGPH: A Parallel Computational Model with Hierarchical Communication Awareness.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on Computational Science and Engineering, 2010

QuantWiz: A scalable parallel software package for label-free protein quantification.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Bio-Inspired Computing: Theories and Applications, 2010

Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

2009

A parallel shortest path algorithm based on graph-partitioning and iterative correcting.

[BibT_eX]

Yuxin Tang

Yunquan Zhang

Hu Chen

Comput. Syst. Sci. Eng., 2009

Early Performance Evaluation of Dawning 5000A and DeepComp 7000.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

QuantWiz: A Parallel Software Package for LC-MS-based Label-Free Protein Quantification.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

Performance Evaluation of Multithreaded Sparse Matrix-Vector Multiplication Using OpenMP.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

Development of a Scalable Solver for the Earth's Core Convection.

[BibT_eX]

[DOI]

Chao Yang

Ligang Li

Yunquan Zhang

Proceedings of the High Performance Computing and Applications, 2009

2008

Basic research in computer science and software engineering at SKLCS.

[BibT_eX]

[DOI]

Frontiers Comput. Sci. China, 2008

Parallelization of FM-Index.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008

Memory Access Complexity Analysis of SpMV in RAM (h) Model.

[BibT_eX]

[DOI]

E. Yuan

Yunquan Zhang

Xiangzheng Sun

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008

Utilizing the Multi-threading Techniques to Improve the Two-Level Checkpoint/Rollback System for MPI Applications.

[BibT_eX]

[DOI]

Yuan Tang

Yunquan Zhang

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008

2007

Models of parallel computation: a survey and classification.

[BibT_eX]

[DOI]

Frontiers Comput. Sci. China, 2007

A brief introduction to China HPC TOP100: from 2002 to 2006.

[BibT_eX]

[DOI]

Proceedings of the CHINA HPC 2007, 2007

Block size selection of parallel LU and QR on PVP-based and RISC-based supercomputers.

[BibT_eX]

[DOI]

Yunquan Zhang

Ying Chen

Yuan Tang

Proceedings of the CHINA HPC 2007, 2007

Efficient Construction of FM-index Using Overlapping Block Processing for Large Scale Texts.

[BibT_eX]

[DOI]

Di Zhang

Yunquan Zhang

Jing Chen

Proceedings of the Advances in Information Retrieval, 2007

2006

Study on Parallel Computing.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2006

2003

Hardware Impact on Communication Performance of Beowulf LINUX Cluster.

[BibT_eX]

Proceedings of the 21st IASTED International Multi-Conference on Applied Informatics (AI 2003), 2003

Yunquan Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...