Ninghui Sun
Orcid: 0000-0002-1953-1392
According to our database1,
Ninghui Sun
authored at least 181 papers
between 1997 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
SLAM-CIM: A Visual SLAM Backend Processor With Dynamic-Range-Driven-Skipping Linear-Solving FP-CIM Macros.
IEEE J. Solid State Circuits, November, 2024
FPIA: Communication-Aware Multi-Chiplet Integration With Field-Programmable Interconnect Fabric on Reusable Silicon Interposer.
IEEE Trans. Circuits Syst. I Regul. Pap., September, 2024
ACM Trans. Archit. Code Optim., March, 2024
J. Comput. Sci. Technol., March, 2024
CoRR, 2024
Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads.
CoRR, 2024
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
MPC-PAT: A Pipeline Architecture for Beaver Triple Generation in Secure Multi-party Computation.
Proceedings of the IEEE International Test Conference in Asia, 2024
Proceedings of the 53rd International Conference on Parallel Processing, 2024
XiangShan: An Open-Source Project for High-Performance RISC-V Processors Meeting Industrial-Grade Standards.
Proceedings of the 36th IEEE Hot Chips Symposium, 2024
Proceedings of the Euro-Par 2024: Parallel Processing, 2024
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
Chaos: Function Granularity Runtime Address Layout Space Randomization for Kernel Module.
Proceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems, 2024
2023
Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output Activation.
IEEE Trans. Parallel Distributed Syst., December, 2023
ArkGPU: enabling applications' high-goodput co-location execution on multitasking GPUs.
CCF Trans. High Perform. Comput., September, 2023
Functional Verification for Agile Processor Development: A Case for Workflow Integration.
J. Comput. Sci. Technol., July, 2023
IEEE Micro, 2023
CoRR, 2023
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023
Proceedings of the Algorithms and Architectures for Parallel Processing, 2023
Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data Centers.
Proceedings of the 19th Workshop on Hot Topics in Operating Systems, 2023
2022
IEEE Trans. Parallel Distributed Syst., 2022
IEEE Trans. Circuits Syst. II Express Briefs, 2022
IEEE Trans. Computers, 2022
Extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms.
CoRR, 2022
Fast and accurate variable batch size convolution neural network training on large scale distributed systems.
Concurr. Comput. Pract. Exp., 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Extending the limit of molecular dynamics with <i>ab initio</i> accuracy to 10 billion atoms.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
2021
A New Optoelectronic Hybrid Network Based on Scheduling Optimization of Optical Links.
IEEE Trans. Computers, 2021
J. Comput. Sci. Technol., 2021
CCF Trans. High Perform. Comput., 2021
Proceedings of the 2021 IEEE International Conference on Engineering, 2021
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021
2020
Addressing Irregularity in Sparse Neural Networks Through a Cooperative Software/Hardware Approach.
IEEE Trans. Computers, 2020
Proceedings of the 38th IEEE International Conference on Computer Design, 2020
Design Automation Methodology from RTL to Gate-level Netlist and Schematic for RSFQ Logic Circuits.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020
2019
PIM-WEAVER: A High Energy-efficient, General-purpose Acceleration Architecture for String Operations in Big Data Processing.
Sustain. Comput. Informatics Syst., 2019
Wormhole optical network: a new architecture to solve long diameter problem in exascale computer.
CCF Trans. High Perform. Comput., 2019
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019
IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication.
Proceedings of the ACM International Conference on Supercomputing, 2019
QoSMT: supporting precise performance control for simultaneous multithreading architecture.
Proceedings of the ACM International Conference on Supercomputing, 2019
A New Traffic Offloading Method with Slow Switching Optical Device in Exascale Computer.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019
2018
ACM Trans. Parallel Comput., 2018
IEEE Robotics Autom. Lett., 2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Proceedings of the 47th International Conference on Parallel Processing, 2018
SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Proceedings of the Advanced Computer Architecture - 12th Conference, 2018
2017
J. Comput. Sci. Technol., 2017
Int. J. Parallel Program., 2017
Proceedings of the 2017 USENIX Annual Technical Conference, 2017
Proceedings of the Network and Parallel Computing, 2017
Proceedings of the International Conference on Supercomputing, 2017
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017
Proceedings of the Workshop on Smart Internet of Things, SmartIoT@SEC 2017, 2017
Proceedings of the 54th Annual Design Automation Conference, 2017
2016
Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters.
IEEE Trans. Parallel Distributed Syst., 2016
Accelerating Irregular Computation in Massive Short Reads Mapping on FPGA Co-Processor.
IEEE Trans. Parallel Distributed Syst., 2016
Nucleic Acids Res., 2016
Commun. ACM, 2016
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016
ACCC: An Acceleration Mechanism for Character Operation Based on Cache Computing in Big Data Applications.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2016
2015
ACM Trans. Comput. Syst., 2015
J. Comput. Sci. Technol., 2015
Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance.
Int. J. High Perform. Comput. Appl., 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 44th International Conference on Parallel Processing, 2015
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015
Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD).
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015
2014
Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture.
J. Supercomput., 2014
HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap.
ACM Trans. Archit. Code Optim., 2014
IEEE Micro, 2014
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014
Proceedings of the 2014 International Conference on Supercomputing, 2014
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014
Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems.
Proceedings of the 17th IEEE International Conference on Computational Science and Engineering, 2014
Digging deeper into cluster system logs for failure prediction and root cause diagnosis.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014
2013
J. Comput. Sci. Technol., 2013
Comput. Sci. Res. Dev., 2013
CoRR, 2013
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013
Proceedings of the International Conference on Parallel and Distributed Computing, 2013
SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013
Proceedings of the 22nd International Conference on Computer Communication and Networks, 2013
Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013
2012
IEEE Micro, 2012
CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications.
Frontiers Comput. Sci., 2012
Compression and Sieve: Reducing Communication in Parallel Breadth First Search on Distributed Memory Systems
CoRR, 2012
High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012
Proceedings of the International Conference on Supercomputing, 2012
ALWP: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012
A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012
Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012
CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012
2011
J. Comput. Sci. Technol., 2011
Proceedings of the Conference on High Performance Computing Networking, 2011
Proceedings of the 12th International Conference on Parallel and Distributed Computing, 2011
EthSpeeder: A High-performance Scalable Fault-Tolerant Ethernet Network Architecture for Data Center.
Proceedings of the Sixth International Conference on Networking, Architecture, and Storage, 2011
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011
Floating-point mixed-radix FFT core generation for FPGA and comparison with GPU and CPU.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
Proceedings of the 2011 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2011
2010
Int. J. High Perform. Comput. Appl., 2010
Frontiers Comput. Sci. China, 2010
Frontiers Comput. Sci. China, 2010
Proceedings of the ReConFig'10: 2010 International Conference on Reconfigurable Computing and FPGAs, 2010
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010
P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation.
Proceedings of the 24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010
Proceedings of the GCC 2010, 2010
Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010
2009
Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2009
Proceedings of the 2009 Spring Simulation Multiconference, SpringSim 2009, 2009
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
A Scalability Analysis of the Symmetric Multiprocessing Architecture in Multi-Core System.
Proceedings of the International Conference on Networking, Architecture, and Storage, 2009
Proceedings of the International Conference on Networking, Architecture, and Storage, 2009
Proceedings of the International Conference on Networking, Architecture, and Storage, 2009
A Virtualized Self-Adaptive Parallel Programming Framework for Heterogeneous High Productivity Computers.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009
Proceedings of the ICPP 2009, 2009
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
2008
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008
Proceedings of the Seventh International Conference on Grid and Cooperative Computing, 2008
Proceedings of the Seventh International Conference on Grid and Cooperative Computing, 2008
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008
2007
IEEE Trans. Circuits Syst. II Express Briefs, 2007
Regular Paper: A Study of Architectural Optimization Methods in Bioinformatics Applications.
Int. J. High Perform. Comput. Appl., 2007
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007
Proceedings of the CHINA HPC 2007, 2007
Proceedings of the CHINA HPC 2007, 2007
Proceedings of the International Conference on Networking, 2007
United-FS: A Logical File System Providing a Single Image of Multiple Physical File Systems on NFS Server.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007
2006
J. Comput. Sci. Technol., 2006
Biology - Locality and parallelism optimization for dynamic programming algorithm in bioinformatics.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Research on Key Technologies of Load Balancing for NFS Server with Multiple Network Paths.
Proceedings of the Grid and Cooperative Computing Workshops, 2006
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006
2005
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005
An Efficient Dynamic Programming Algorithm and Implementation for RNA Secondary Structure Prediction.
Proceedings of the Computational Science, 2005
Proceedings of the Computational Science, 2005
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005
Proceedings of the Computational Intelligence and Security, International Conference, 2005
2004
Proceedings of the Embedded Software and Systems, First International Conference, 2004
2003
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003
2002
NCPN: A Simulation Tool for Coloured Petri Nets.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2002
2001
Proceedings of the 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 2001
1999
J. Comput. Sci. Technol., 1999
1997