Guangming Tan

Wei Zhou

IEEE Trans. Parallel Distributed Syst., 2018

An Autotuning Protocol to Rapidly Build Autotuners.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2018

Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture.

[BibT_eX]

[DOI]

Junhong Liu

Jiajia Li

ACM Trans. Math. Softw., 2018

Automated and precise event detection method for big data in biomedical imaging with support vector machine.

[BibT_eX]

[DOI]

Lufeng Yuan

Erlin Yao

Comput. Syst. Sci. Eng., 2018

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

High-performance genomic analysis framework with in-memory computing.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Routing and Spectrum Allocation for Time Varying Traffic by Artificial Bee Colony Algorithm in Elastic Optical Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

Accelerating FM-index Search for Genomic Data Processing.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

2017

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

A performance analysis framework for exploiting GPU microarchitectural capability.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2017

RING: NUMA-Aware Message-Batching Runtime for Data-Intensive Applications.

[BibT_eX]

[DOI]

Ke Meng

Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Quantifying and Mitigating Computational Inefficiency of Genomics Data Analysis.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

2016

Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Accelerating Irregular Computation in Massive Short Reads Mapping on FPGA Co-Processor.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

边缘海静力数值预报模式并行算法研究 (Parallelization of Hydrostatic Numerical Forecasting Model of Marginal Sea).

[BibT_eX]

[DOI]

计算机科学, 2016

Locality of Computation for Stencil Optimization.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Accelerating large-scale genomic analysis with Spark.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2016

2015

SuperDragon: A Heterogeneous Parallel System for Accelerating 3D Reconstruction of Cryo-Electron Microscopy Images.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2015

Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2015

FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Bit Flipping Errors in High Performance Linpack at Exascale and Beyond.

[BibT_eX]

[DOI]

Erlin Yao

Proceedings of the 44th International Conference on Parallel Processing, 2015

Study on Partitioning Real-World Directed Graphs of Skewed Degree Distribution.

[BibT_eX]

[DOI]

Proceedings of the 44th International Conference on Parallel Processing, 2015

Implementation of Short Read Alignment Algorithm in OpenCL on Xeon Phi Coprocessor.

[BibT_eX]

[DOI]

Xiquan Zhao

Chuang Liu

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Application Taxonomy via Algorithmic Commonality for Domain-Specific Architecture Desgin.

[BibT_eX]

[DOI]

Yuanrong Wang

Qiangqiang Li

Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

A Reliable Distributed Convolutional Neural Network for Biology Image Segmentation.

[BibT_eX]

[DOI]

Xiuxia Zhang

Mingyu Chen

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014

Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture.

[BibT_eX]

[DOI]

J. Supercomput., 2014

Accelerating massive short reads mapping for next generation sequencing (abstract only).

[BibT_eX]

[DOI]

Chunming Zhang

Wen Tang

Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Computational Science and Engineering, 2014

Optimizing stencil code via locality of computation.

[BibT_eX]

[DOI]

Yulong Luo

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Scalability study of molecular dynamics simulation on Godson-T many-core architecture.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

Optimizing Parallel S n Sweeps on Unstructured Grids for Multi-Core Clusters.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2013

Understanding parallelism in graph traversal on multi-core clusters.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2013

GRE: A Graph Runtime Engine for Large-Scale Distributed Graph-Parallel Applications.

[BibT_eX]

[DOI]

CoRR, 2013

A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-core/Many-Core Architecture.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on Trust, 2013

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

ParaInsight: An Assistant for Quantitatively Analyzing Multi-granularity Parallel Region.

[BibT_eX]

[DOI]

Ran Ao

Mingyu Chen

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012

SMAT: An Input Adaptive Sparse Matrix-Vector Multiplication Auto-Tuner

[BibT_eX]

[DOI]

CoRR, 2012

Compression and Sieve: Reducing Communication in Parallel Breadth First Search on Distributed Memory Systems

[BibT_eX]

[DOI]

CoRR, 2012

A lightweight hybrid hardware/software approach for object-relative memory profiling.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

PDSEC Introduction.

[BibT_eX]

[DOI]

Thomas Rauber

Gudula Rünger

Peter Strazdins

Laurence Tianruo Yang

Yi Pan

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2012

A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction.

[BibT_eX]

[DOI]

Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

2011

Analysis and performance results of computing betweenness centrality on IBM Cyclops64.

[BibT_eX]

[DOI]

Vugranam C. Sreedhar

J. Supercomput., 2011

Revisiting Multiple Pattern Matching Algorithms for Multi-Core Architecture.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2011

Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2011

Numerical assessment of flood hazard risk to people and vehicles in flash floods.

[BibT_eX]

[DOI]

Environ. Model. Softw., 2011

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

[BibT_eX]

[DOI]

CoRR, 2011

Fast implementation of DGEMM on Fermi GPU.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Building algorithmically nonstop fault tolerant MPI programs.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on High Performance Computing, 2011

Performance analysis and optimization of molecular dynamics simulation on Godson-T many-core processor.

[BibT_eX]

[DOI]

Proceedings of the 8th Conference on Computing Frontiers, 2011

2010

Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks.

[BibT_eX]

[DOI]

Jiajia Li

Mingyu Chen

Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009

Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2009

Extending Amdahl's law in the multicore era.

[BibT_eX]

[DOI]

SIGMETRICS Perform. Evaluation Rev., 2009

Characterizing Betweenness Centrality Algorithm on Multi-core Architectures.

[BibT_eX]

[DOI]

Dengbiao Tu

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009

Single-particle 3d reconstruction from cryo-electron microscopy images on GPU.

[BibT_eX]

[DOI]

Proceedings of the 23rd international conference on Supercomputing, 2009

A Parallel Algorithm for Computing Betweenness Centrality.

[BibT_eX]

[DOI]

Dengbiao Tu

Proceedings of the ICPP 2009, 2009

High Performance Matrix Multiplication on Many Cores.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008

Experience on optimizing irregular computation for memory hierarchy in manycore architecture.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture.

[BibT_eX]

[DOI]

Vugranam C. Sreedhar

Proceedings of the Languages and Compilers for Parallel Computing, 2008

2007

Cache oblivious algorithms for nonserial polyadic programming.

[BibT_eX]

[DOI]

J. Supercomput., 2007

Regular Paper: A Study of Architectural Optimization Methods in Bioinformatics Applications.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2007

A parallel dynamic programming algorithm on a multi-core architecture.

[BibT_eX]

[DOI]

Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform.

[BibT_eX]

[DOI]

Peiheng Zhang

Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications, 2007

2006

Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2006

Biology - Locality and parallelism optimization for dynamic programming algorithm in bioinformatics.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

An experimental study of optimizing bioinformatics applications.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Improving locality of nonserial polyadic dynamic programming.

[BibT_eX]

[DOI]

Dongbo Bu

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

2005

Load Balancing Algorithm in Cluster-based RNA secondary structure Prediction.

[BibT_eX]

[DOI]

Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

An Optimized Algorithm of High Spatial-temporal Efficiency for MegaBlast.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

An Efficient Dynamic Programming Algorithm and Implementation for RNA Secondary Structure Prediction.

[BibT_eX]

[DOI]

Xinchun Liu

Proceedings of the Computational Science, 2005

Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster.

[BibT_eX]

[DOI]