Chao Yang

Orcid: 0000-0001-7426-6248

Affiliations:
  • Peking University, Beijing, China
  • Chinese Academy of Sciences, Institute of Software, Beijing, China (former)


According to our database1, Chao Yang authored at least 94 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Deep Adaptive Sampling for Surrogate Modeling Without Labeled Data.
J. Sci. Comput., December, 2024

AONN: An Adjoint-Oriented Neural Network Method for All-At-Once Solutions of Parametric Optimal Control Problems.
SIAM J. Sci. Comput., February, 2024

Nonlinearly Constrained Pressure Residual (NCPR) Algorithms for Fractured Reservoir Simulation.
SIAM J. Sci. Comput., February, 2024

Adaptive Space-Time Domain Decomposition for Multiphase Flow in Porous Media with Bound Constraints.
SIAM J. Sci. Comput., 2024

AONN-2: An adjoint-oriented neural network method for PDE-constrained shape optimization.
J. Comput. Phys., 2024

HaTT: Hadamard avoiding TT recompression.
CoRR, 2024

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.
CoRR, 2024

APTT: An accuracy-preserved tensor-train method for the Boltzmann-BGK equation.
CoRR, 2024

HOSCF: Efficient decoupling algorithms for finding the best rank-one approximation of higher-order tensors.
CoRR, 2024

Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

Adversarial Adaptive Sampling: Unify PINN and Optimal Transport for the Approximation of PDEs.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Nonlinear parallel-in-time simulations of multiphase flow in porous media.
J. Comput. Phys., December, 2023

DAS-PINNs: A deep adaptive sampling method for solving high-dimensional partial differential equations.
J. Comput. Phys., March, 2023

Publisher Correction: xMath2.0: a high-performance extended math library for SW26010-Pro many-core processor.
CCF Trans. High Perform. Comput., March, 2023

xMath2.0: a high-performance extended math library for SW26010-Pro many-core processor.
CCF Trans. High Perform. Comput., March, 2023

a-Tucker: fast input-adaptive and matricization-free Tucker decomposition of higher-order tensors on GPUs.
CCF Trans. High Perform. Comput., March, 2023

Tensor-Based Sketching Method for the Low-Rank Approximation of Data Streams.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Parallel finite volume simulation of the spherical shell dynamo with pseudo-vacuum magnetic boundary conditions.
J. Comput. Phys., 2022

Scalable semismooth Newton methods with multilevel domain decomposition for subsurface flow and reactive transport in porous media.
J. Comput. Phys., 2022

Multilevel field-split preconditioners with domain decomposition for steady and unsteady flow problems.
Comput. Phys. Commun., 2022

Parallel energy stable phase field simulations of Ni-based alloys system.
CoRR, 2022

EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity.
IEEE Trans. Parallel Distributed Syst., 2021

Efficient Alternating Least Squares Algorithms for Low Multilinear Rank Approximation of Tensors.
J. Sci. Comput., 2021

Variational inequality transport model on the sphere by the active-set reduced-space algorithm.
Comput. Phys. Commun., 2021

DAS: A deep adaptive sampling method for solving partial differential equations.
CoRR, 2021

A rank-adaptive higher-order orthogonal iteration algorithm for truncated Tucker decomposition.
CoRR, 2021

AutoWM: a novel domain-specific tool for universal multi-/many-core accelerations of the WRF cloud microphysics.
Clust. Comput., 2021

2020
Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor.
ACM Trans. Archit. Code Optim., 2020

Parallel Energy-Stable Solver for a Coupled Allen-Cahn and Cahn-Hilliard System.
SIAM J. Sci. Comput., 2020

A Spatiotemporal Causality Based Governance Framework for Noisy Urban Sensory Data.
J. Comput. Sci. Technol., 2020

Parallel multilevel restricted Schwarz preconditioners for implicit simulation of subsurface flows with Peng-Robinson equation of state.
J. Comput. Phys., 2020

a-Tucker: Input-Adaptive and Matricization-Free Tucker Decomposition for Dense Tensors on CPUs and GPUs.
CoRR, 2020

Efficient Alternating Least Squares Algorithms for Truncated HOSVD of Higher-Order Tensors.
CoRR, 2020

Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight.
Clust. Comput., 2020

2019
Optimizing Finite Volume Method Solvers on Nvidia GPUs.
IEEE Trans. Parallel Distributed Syst., 2019

Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight.
J. Comput. Sci. Technol., 2019

A fully implicit constraint-preserving simulator for the black oil model of petroleum reservoirs.
J. Comput. Phys., 2019

Parallel reservoir simulators for fully implicit complementarity formulation of multicomponent compressible flows.
Comput. Phys. Commun., 2019

Parallel energy-stable phase field crystal simulations based on domain decomposition methods.
Comput. Phys. Commun., 2019

2018
PEPS++: Towards Extreme-Scale Simulations of Strongly Correlated Quantum Many-Particle Models on Sunway TaihuLight.
IEEE Trans. Parallel Distributed Syst., 2018

Extreme-Scale High-Order WENO Simulations of 3-D Detonation Wave with 10 Million Cores.
ACM Trans. Archit. Code Optim., 2018

Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer.
ACM Trans. Archit. Code Optim., 2018

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Extreme-Scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
Development of a hybrid parallel MCV-based high-order global shallow-water model.
J. Supercomput., 2017

Solving Mesoscale Atmospheric Dynamics Using a Reconfigurable Dataflow Architecture.
IEEE Micro, 2017

Nonlinearly preconditioned semismooth Newton methods for variational inequality solution of two-phase flow in porous media.
J. Comput. Phys., 2017

A Multi-Perspective Method for Analysis of Cooperative Behaviors Among Industrial Devices of Smart Factory.
IEEE Access, 2017

A 3-Layer Method for Analysis of Cooperative Behaviors of Physical Devices in Cyber-Physical Systems.
Proceedings of the Wireless Algorithms, Systems, and Applications, 2017

26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor.
Proceedings of the 46th International Conference on Parallel Processing, 2017

FP-MRBP: Fine-grained Parallel MapReduce Back Propagation Algorithm.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2017, 2017

2016
Active-Set Reduced-Space Methods with Nonlinear Elimination for Two-Phase Flow Problems in Porous Media.
SIAM J. Sci. Comput., 2016

A Nonlinearly Preconditioned Inexact Newton Algorithm for Steady State Lattice Boltzmann Equations.
SIAM J. Sci. Comput., 2016

623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores.
Int. J. High Perform. Comput. Appl., 2016

The Sunway TaihuLight supercomputer: system and applications.
Sci. China Inf. Sci., 2016

10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics.
Proceedings of the International Conference for High Performance Computing, 2016

Accelerating the Simulation of Thermal Convection in the Earth's Outer Core on Tianhe-2.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Accelerating the 3D euler atmospheric solver through heterogeneous CPU-GPU platforms.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Generalized GPU Acceleration for Applications Employing Finite-Volume Methods.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Fast Parallel Stream Compaction for IA-Based Multi/many-core Processors.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Unleashing the performance potential of CPU-GPU platforms for the 3D atmospheric Euler solver.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

2015
Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms.
ACM Trans. Reconfigurable Technol. Syst., 2015

Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2.
IEEE Trans. Computers, 2015

A Fully Implicit Method for Lattice Boltzmann Equations.
SIAM J. Sci. Comput., 2015

A parallel domain decomposition-based implicit method for the Cahn-Hilliard-Cook phase-field equation in 3D.
J. Comput. Phys., 2015

A multiscale algorithm for radiative heat transfer equation with rapidly oscillating coefficients.
Appl. Math. Comput., 2015

A Smart Work Performance Measurement System for Police Officers.
IEEE Access, 2015

Pattern-Driven Hybrid Multi- and Many-Core Acceleration in the MPAS Shallow-Water Model.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Performance Evaluation of HPGMG on Tianhe-2: Early Experience.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

A Formal Approach for Modeling and Verification of Distributed Systems.
Proceedings of the Cloud Computing - 6th International Conference, 2015

2014
A Scalable Fully Implicit Compressible Euler Solver for Mesoscale Nonhydrostatic Simulation of Atmospheric Flows.
SIAM J. Sci. Comput., 2014

Parallel Domain Decomposition Methods with Mixed Order Discretization for Fully Implicit Solution of Tracer Transport Problems on the Cubed-Sphere.
J. Sci. Comput., 2014

Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Accelerating HPCG on Tianhe-2: A hybrid CPU-MIC algorithm.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Scaling and analyzing the stencil performance on multi-core and many-core architectures.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Optimizing and Scaling HPCG on Tianhe-2: Early Experience.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

Optimization of scan algorithms on multi- and many-core processors.
Proceedings of the 21st International Conference on High Performance Computing, 2014

A highly-efficient and green data flow engine for solving euler atmospheric equations.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

2013
A Fully Implicit Compressible Euler Solver for Atmospheric Flows.
Proceedings of the Domain Decomposition Methods in Science and Engineering XX, 2013

A peta-scalable CPU-GPU algorithm for global atmospheric simulations.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

A Scalable Implicit Solver for Phase Field Crystal Simulations.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Accelerating solvers for global atmospheric equations through mixed-precision data flow engine.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

Global Atmospheric Simulation on a Reconfigurable Platform.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

2012
New six-node and seven-node hexagonal finite elements.
Appl. Math. Comput., 2012

2011
Parallel multilevel methods for implicit solution of shallow water equations with nonsmooth topography on the cubed-sphere.
J. Comput. Phys., 2011

A parallel well-balanced finite volume method for shallow water equations with topography on the cubed-sphere.
J. Comput. Appl. Math., 2011

2010
A Fully Implicit Domain Decomposition Algorithm for Shallow Water Equations on the Cubed-Sphere.
SIAM J. Sci. Comput., 2010

Scalability Studies of an Implicit Shallow Water Solver for the Rossby-Haurwitz Problem.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

Numerical Simulation of the Thermal Convection in the Earth's Outer Core.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

2009
Development of a Scalable Solver for the Earth's Core Convection.
Proceedings of the High Performance Computing and Applications, 2009


  Loading...