Hong An

Orcid: 0000-0002-3900-3722

According to our database1, Hong An authored at least 119 papers between 1999 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SWattention: designing fast and memory-efficient attention for a new Sunway Supercomputer.
J. Supercomput., July, 2024

Uncovering the performance bottleneck of modern HPC processor with static code analyzer: a case study on Kunpeng 920.
CCF Trans. High Perform. Comput., June, 2024

An N-Shaped Lightweight Network with a Feature Pyramid and Hybrid Attention for Brain Tumor Segmentation.
Entropy, February, 2024

Gene expression bias between the subgenomes of allopolyploid hybrids is an emergent property of the kinetics of expression.
PLoS Comput. Biol., January, 2024

Extending the limit of LR-TDDFT on two different approaches: Numerical algorithms and new Sunway heterogeneous supercomputer.
Parallel Comput., 2024

PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer.
CoRR, 2024

Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness.
CoRR, 2024

Rethinking automatic segmentation of gross target volume from a decoupling perspective.
Comput. Medical Imaging Graph., 2024

Predictive Accuracy-Based Active Learning for Medical Image Segmentation.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT Calculations.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

Multi-level Load Balancing Strategies for Massively Parallel Smoothed Particle Hydrodynamics Simulation.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

A<sup>3</sup>PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023
Deep learning representations for quantum many-body systems on heterogeneous hardware.
Mach. Learn. Sci. Technol., March, 2023

High performance computing for first-principles Kohn-Sham density functional theory towards exascale supercomputers.
CCF Trans. High Perform. Comput., March, 2023

swMPAS-A: Scaling MPAS-A to 39 Million Heterogeneous Cores on the New Generation Sunway Supercomputer.
IEEE Trans. Parallel Distributed Syst., 2023

Establishing a Modeling System in 3-km Horizontal Resolution for Global Atmospheric Circulation triggered by Submarine Volcanic Eruptions with 400 Billion Smoothed Particle Hydrodynamics.
Proceedings of the International Conference for High Performance Computing, 2023

Contrast Learning Based Robust Framework for Weakly Supervised Medical Image Segmentation with Coarse Bounding Box Annotations.
Proceedings of the Computational Mathematics Modeling in Cancer Analysis, 2023

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

FRNET: An Effective Hybrid Structure for Automatic Segmentation of Head and Neck Primary Tumors from Multimodal Images.
Proceedings of the 20th IEEE International Symposium on Biomedical Imaging, 2023

SWSPH: A Massively Parallel SPH Implementation for Hundred-Billion-Particle Simulation on New Sunway Supercomputer.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

2022
Bridging the Gap between Deep Learning and Frustrated Quantum Spin System for Extreme-Scale Simulations on New Generation of Sunway Supercomputer.
IEEE Trans. Parallel Distributed Syst., 2022

AI for Quantum Mechanics: High Performance Quantum Many-Body Simulations via Deep Learning.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

2.5 Million-Atom Ab Initio Electronic-Structure Simulation of Complex Metallic Heterostructures with DGDFT.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

A Systematic Methodology for performance characterizing of Heterogeneous Systems with a dataflow runtime simulator.
Proceedings of the 4th International Conference on Robotics, 2022

SCAR U-Net: A 3D Spatial-Channel Attention ResU-Net for Brain Tumor Segmentation.
Proceedings of the 3rd International Symposium on Artificial Intelligence for Medicine Sciences, 2022

Accelerating Parallel First-Principles Excited-State Calculation by Low-Rank Approximation with K-Means Clustering.
Proceedings of the 51st International Conference on Parallel Processing, 2022

High-Performance Matrix Multiplication on the New Generation Shenwei Processor.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

Machine Learning-enabled Performance Model for DNN Applications and AI Accelerator.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

Quantifying Throughput of Basic Blocks on ARM Microarchitectures by Static Code Analyzers: A Case Study on Kunpeng 920.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

Whole Slide Image Multi-Classification of Cervical Epithelial Lesions Based on Unsupervised Pre-training.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

Automatic Segmentation of Target Structures for Total Marrow and Lymphoid Irradiation in Bone Marrow Transplantation.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

FcTC-UNet: Fine-grained Combination of Transformer and CNN for Thoracic Organs Segmentation.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

ITUnet: Integration Of Transformers And Unet For Organs-At-Risk Segmentation.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

An Accelerated First Principle Method Implemented on IntelGPU.
Proceedings of the 6th International Conference on Computer Science and Application Engineering, 2022

2021
Towards Efficient Short-Range Pair Interaction on Sunway Many-Core Architecture.
J. Comput. Sci. Technol., 2021

swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer.
Inf. Sci., 2021

RDMA-Based Apache Storm for High-Performance Stream Data Processing.
Int. J. Parallel Program., 2021

Dual-Attention Residual Network for Automatic Diagnosis of COVID-19.
CoRR, 2021

Symplectic structure-preserving particle-in-cell whole-volume simulation of tokamak plasmas to 111.3 trillion particles and 25.7 billion grids.
Proceedings of the International Conference for High Performance Computing, 2021

Global Multi-Level Attention Network for the Segmentation of Clinical Target Volume In The Planning CT For Cervical Cancer.
Proceedings of the 18th IEEE International Symposium on Biomedical Imaging, 2021

Fast Whole Slide Image Analysis Of Cervical Cancer Using Weak Annotation.
Proceedings of the 18th IEEE International Symposium on Biomedical Imaging, 2021

Reducing the Annotation Cost of Whole Slide Histology Images using Active Learning.
Proceedings of the IPMV 2021: 3rd International Conference on Image Processing and Machine Vision, Hong Kong, SAR, China, May 22, 2021

Rethinking Logits-Level Knowledge Distillation.
Proceedings of the ICCPR '21: 10th International Conference on Computing and Pattern Recognition, Shanghai, China, October 15, 2021

Simultaneous Right Ventricle End-diastolic and End-systolic Frame Identification and Landmark Detection on Echocardiography.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

DARNet: Dual-Attention Residual Network for Automatic Diagnosis of COVID-19 via CT Images.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2021

2020
Distributed deep learning system for cancerous region detection on Sunway TaihuLight.
CCF Trans. High Perform. Comput., 2020

Runtime Adaptive Matrix Multiplication for the SW26010 Many-Core Processor.
IEEE Access, 2020

RDMA-Based Apache Storm for High-Performance Stream Data Processing.
Proceedings of the Network and Parallel Computing, 2020

A Novel U-Like Network For The Segmentation Of Thoracic Organs.
Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging Workshops (ISBI Workshops), 2020

An Efficient Multi-GPU Implementation for Linear-Response Time-Dependent Density Functional Theory.
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020

Optimizing Astrophysical Simulation Software on Sunway Heterogeneous Manycore Architecture.
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020

2019
CARS: A contention-aware scheduler for efficient resource management of HPC storage systems.
Parallel Comput., 2019

众核平台上广度优先搜索算法的优化 (Optimization of Breadth-first Search Algorithm Based on Many-core Platform).
计算机科学, 2019

Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous Systems.
J. Comput. Sci. Technol., 2019

Improving the Performance of Distributed MXNet with RDMA.
Int. J. Parallel Program., 2019

DDP-B: A Distributed Dynamic Parallel Framework for Meta-genomics Binary Similarity.
Proceedings of the Network and Parallel Computing, 2019

Gdarts: A GPU-Based Runtime System for Dataflow Task Programming on Dependency Applications.
Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019

Improving the Performance of MongoDB with RDMA.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

swFLOW: A Dataflow Deep Learning Framework on Sunway TaihuLight Supercomputer.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

TripletRun: A Dataflow Runtime Simulator and Its Performance Model.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Interference-Aware I/O Scheduling for Data-Intensive Applications on Hierarchical HPC Storage Systems.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Redesign NAMD Molecular Dynamics Non-Bonded Force-Field on Sunway Manycore Processor.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

An effective method for operations placement in Tensor Flow.
Proceedings of the 3rd International Conference on High Performance Compilation, 2019

2018
PEPS++: Towards Extreme-Scale Simulations of Strongly Correlated Quantum Many-Particle Models on Sunway TaihuLight.
IEEE Trans. Parallel Distributed Syst., 2018

Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive.
Int. J. Parallel Program., 2018

Improving the Performance of Distributed TensorFlow with RDMA.
Int. J. Parallel Program., 2018

Contention-Aware Resource Scheduling for Burst Buffer Systems.
Proceedings of the 47th International Conference on Parallel Processing, 2018

2017
A Dataflow-Based Runtime Support on a 100P Actual System.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

Refactoring the Molecular Docking Simulation for Heterogeneous, Manycore Processors Systems.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

A hierarchical grid algorithm for accelerating high-performance conjugate gradient benchmark on sunway many-core processor.
Proceedings of the 3rd International Conference on Communication and Information Processing, 2017

Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-Core Processor.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2017

2016
A Flexible Chip Multiprocessor Simulator Dedicated for Thread Level Speculation.
Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, 2016

Parallelizing Back Propagation Neural Network on Speculative Multicores.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

2015
程序阶段性分析和阶段检测技术 (Program Phase Analysis and Phase Detection Techniques).
计算机科学, 2015

Speculative Parallelism Characterization Profiling in General Purpose Computing Applications.
J. Comput. Sci. Eng., 2015

Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA.
Proceedings of the Neural Information Processing - 22nd International Conference, 2015

Local State Reusing for Efficient Model Checking of Multithreaded Programs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Parallelizing Block Cryptography Algorithms on Speculative Multicores.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Optimization of Binomial Option Pricing on Intel MIC Heterogeneous System.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

2014
Exploring speculative procedure and loop level parallelism in SPLASH2.
Int. J. High Perform. Syst. Archit., 2014

Efficient execution of speculative threads and transactions with hardware transactional memory.
Future Gener. Comput. Syst., 2014

A Criticality-Aware DVFS Runtime Utility for Optimizing Power Efficiency of Multithreaded Applications.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Understanding the SIMD Efficiency of Graph Traversal on GPU.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

A Compiler Translate Directive-Based Language to Optimized CUDA.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

2013
Phase-Priority based Directory Coherence for Multicore Processor
CoRR, 2013

Quantitative Analysis of Inter-block Dependence in Speculative Execution.
Proceedings of the 12th IEEE International Conference on Trust, 2013

2012
Priority-based squash reducing methods in thread level speculation.
Int. J. Inf. Technol. Commun. Convergence, 2012

FlexBFS: a parallelism-aware implementation of breadth-first search on GPU.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

A Speculative HMMER Search Implementation on GPU.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

CRQ-based fair scheduling on composable multicore architectures.
Proceedings of the International Conference on Supercomputing, 2012

Distributed replay protocol for distributed uniprocessors.
Proceedings of the International Conference on Supercomputing, 2012

SeTM: Efficient Execution of Speculative Threads with Hardware Transactional Memory.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

VSCP: A Cache Controlling Method for Improving Single Thread Performance in Multicore System.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Distributed Control Independence for Composable Multi-processors.
Proceedings of the 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, Shanghai, China, May 30, 2012

Value Predicted LogSPoTM: Improve the Parallesim of Thread Level System by Using a Value Predictor.
Proceedings of the 2012 IEEE/ACIS 11th International Conference on Computer and Information Science, Shanghai, China, May 30, 2012

2011
CHMasters: A Scalable and Speed-Efficient Metadata Service in Distributed File System.
Proceedings of the 12th International Conference on Parallel and Distributed Computing, 2011

A Non-blocking Programming Framework for Pipeline Application on Multi-core Platform.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

A Priority-Aware NoC to Reduce Squashes in Thread Level Speculation for Chip Multiprocessors.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

Exploiting Speculative Thread-Level Parallelism Based on Transactional Memory.
Proceedings of the Third International Conference on Communications and Mobile Computing, 2011

Accelerating Block Cryptography Algorithms in Procedure Level Speculation.
Proceedings of the Seventh International Conference on Computational Intelligence and Security, 2011

2010
FACRA: Flexible-Core Architecture Chip Resource Abstractor.
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application.
Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

Pattern-Unit Based Regular Expression Matching with Reconfigurable Function Unit.
Proceedings of the Computational Science and Its Applications, 2010

Dynamic Resource Tuning for Flexible Core Chip Multiprocessors.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

The optimization of parallel Smith-Waterman sequence alignment using on-chip memory of GPGPU.
Proceedings of the Fifth International Conference on Bio-Inspired Computing: Theories and Applications, 2010

2009
The Mapping Framework and Optimizing Strategy for Block Cryptography Algorithms on Cell Broadband Engine.
Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

Performance and Power Efficiency Analysis of the Symmetric Cryptograph on Two Stream Processor Architectures.
Proceedings of the Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2009), 2009

Investigation of Factors Impacting Thread-Level Parallelism from Desktop, Multimedia and HPC Applications.
Proceedings of the Fourth International Conference on Frontier of Computer Science and Technology, 2009

A Program Behavior Study of Block Cryptography Algorithms on GPGPU.
Proceedings of the Fourth International Conference on Frontier of Computer Science and Technology, 2009

Scaling the Performance of Tiled Processor Architectures with On-Chip-Network Topology.
Proceedings of the Second International Joint Conference on Computational Sciences and Optimization, 2009

2008
A wire delay scalable stream processor architecture.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

Profile guided optimization for dataflow predication.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

LogSPoTM: a scalable thread level speculation model based on transactional memory.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

2007
Balancing Thread Partition for Efficiently Exploiting Speculative Thread-Level Parallelism.
Proceedings of the Advanced Parallel Processing Technologies, 7th International Symposium, 2007

An Online Profile Guided Optimization Approach for Speculative Parallel Threading.
Proceedings of the Advances in Computer Systems Architecture, 2007

2005
Improving Latency Tolerance of Network Processors Through Simultaneous Multithreading.
Proceedings of the Advanced Parallel Processing Technologies, 6th International Workshop, 2005

2000
Broadcasting Under Network Ignorance Scenario.
Proceedings of the Applied Computing 2000, 2000

1999
A Parallel and Distributed Debugger Implemented with Java.
Proceedings of the TOOLS 1999: 31st International Conference on Technology of Object-Oriented Languages and Systems, 1999

A Java/CORBA Based Universal Framework for Super Server User-End Integrated Environments.
Proceedings of the TOOLS 1999: 31st International Conference on Technology of Object-Oriented Languages and Systems, 1999


  Loading...