Dezun Dong

Orcid: 0000-0001-6243-8479

According to our database1, Dezun Dong authored at least 153 papers between 2006 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
A lightweight RDMA connection protocol based on post-hoc confirmation.
J. Parallel Distributed Comput., 2025

2024
COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol Codesign.
ACM Trans. Archit. Code Optim., September, 2024

Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores.
IEEE Trans. Parallel Distributed Syst., May, 2024

A survey of machine learning for Network-on-Chips.
J. Parallel Distributed Comput., April, 2024

Optimizing Full-Spectrum Matrix Multiplications on ARMv8 Multi-Core CPUs.
IEEE Trans. Parallel Distributed Syst., March, 2024

Full-Stack Allreduce on Multi-Rail Networks.
CoRR, 2024

DRLAR: A deep reinforcement learning-based adaptive routing framework for network-on-chips.
Comput. Networks, 2024

DBSR: An Efficient Storage Format for Vectorizing Sparse Triangular Solvers on Structured Grids.
Proceedings of the International Conference for High Performance Computing, 2024

UNR: Unified Notifiable RMA Library for HPC.
Proceedings of the International Conference for High Performance Computing, 2024

GraphCube: Interconnection Hierarchy-aware Graph Processing.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Optimizing General Matrix Multiplications on Modern Multi-core DSPs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Understanding Different Transport Coexistence in Datacenter Networks.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning.
Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, 2024

Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and Vectorization.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

Power of Insensitivity: Fixing Threshold Truncation of Switch Buffer Management Policies.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

ACU: Aggregator-based Congestion control and link Utilization optimization strategy for multi-tenant in-network aggregation.
Proceedings of the 8th Asia-Pacific Workshop on Networking, 2024

Enhancing Gradient Compression for Distributed Deep Learning.
Proceedings of the 8th Asia-Pacific Workshop on Networking, 2024

DDT: Dynamical Selective Dropping Threshold for Reactive Congestion Control.
Proceedings of the ACM Turing Award Celebration Conference 2024, 2024

Enhancing Multi-Agent Communication Collaboration through GPT-Based Semantic Information Extraction and Prediction.
Proceedings of the ACM Turing Award Celebration Conference 2024, 2024

Chimera: Leveraging Hybrid Offsets for Efficient Data Prefetching.
Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024

2023
Communication Optimization Algorithms for Distributed Deep Learning Systems: A Survey.
IEEE Trans. Parallel Distributed Syst., December, 2023

EagerCC: An ultra-low latency congestion control mechanism in datacenter networks.
Comput. Networks, November, 2023

Exploring job running path to predict runtime on multiple production supercomputers.
J. Parallel Distributed Comput., May, 2023

SSD-SGD: Communication Sparsification for Distributed Deep Learning Training.
ACM Trans. Archit. Code Optim., March, 2023

An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation.
CoRR, 2023

In-network aggregation for data center networks: A survey.
Comput. Commun., 2023

Optimizing Direct Convolutions on ARM Multi-Cores.
Proceedings of the International Conference for High Performance Computing, 2023

Input Transformation for Pre-Trained-Model-Based Cross-Language Code Search.
Proceedings of the 23rd IEEE International Conference on Software Quality, 2023

Hierarchical Semantic Graph Construction and Pooling Approach for Cross-language Code Retrieval.
Proceedings of the 23rd IEEE International Conference on Software Quality, 2023

Interpretation-based Code Summarization.
Proceedings of the 31st IEEE/ACM International Conference on Program Comprehension, 2023

LARE: A Linear Approximate Reinforcement Learning Based Adaptive Routing for Network-on-Chips.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2023

Memory-aware Optimization for Sequences of Sparse Matrix-Vector Multiplications.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Optimizing Multi-grid Computation and Parallelization on Multi-cores.
Proceedings of the 37th International Conference on Supercomputing, 2023

Roar: A Router Microarchitecture for In-network Allreduce.
Proceedings of the 37th International Conference on Supercomputing, 2023

GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC.
Proceedings of the 37th International Conference on Supercomputing, 2023

Rately: Accurate Data Center CC based on One-Way Delay.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

DFAR: Dynamic-threshold Fault-tolerant Adaptive Routing for Fat Tree Networks.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Characterize and Optimize Dense Linear Solver on Multi-core CPUs.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Efficiently Running SpMV on Multi-Core DSPs for Block Sparse Matrix.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Efficiently Running SpMV on Multi-core DSPs for Banded Matrix.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2023

DeTAR: A Decision Tree-Based Adaptive Routing in Networks-on-Chip.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

A Multi-level Parallel Integer/Floating-Point Arithmetic Architecture for Deep Learning Instructions.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

DFR: Dynamic-thresold Fault-tolerant Routing for Fat Tree.
Proceedings of the 7th Asia-Pacific Workshop on Networking, 2023

2022
Efficient Data Redistribution Algorithms From Irregular to Block Cyclic Data Distribution.
IEEE Trans. Parallel Distributed Syst., 2022

Exploring the Galaxyfly Family to Build Flexible-Scale Interconnection Networks.
IEEE Trans. Parallel Distributed Syst., 2022

Hybrid Memory Buffer Microarchitecture for High-Radix Routers.
IEEE Trans. Computers, 2022

MUA-Router: Maximizing the Utility-of-Allocation for On-chip Pipelining Routers.
ACM Trans. Archit. Code Optim., 2022

CP-SGD: Distributed stochastic gradient descent with compression and periodic compensation.
J. Parallel Distributed Comput., 2022

Understanding node connection modes in Multi-Rail Fat-tree.
J. Parallel Distributed Comput., 2022

Revisiting network congestion avoidance through adaptive packet-chaining reservation.
Comput. Networks, 2022

FastCredit: Expediting credit-based congestion control in datacenters.
Comput. Networks, 2022

Alleviating Performance Interference Through Intra-Queue I/O Isolation for NVMe-over-Fabrics.
Proceedings of the Network and Parallel Computing, 2022

Fine-grained code-comment semantic interaction analysis.
Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022

Fast-Converging Congestion Control in Datacenter Networks.
Proceedings of the IEEE Symposium on Computers and Communications, 2022

A Quantitative Study of the Spatiotemporal I/O Burstiness of HPC Application.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Optimized MPI collective algorithms for dragonfly topology.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data Centers.
Proceedings of the 51st International Conference on Parallel Processing, 2022

STEGNN: Spatial-Temporal Embedding Graph Neural Networks for Road Network Forecasting.
Proceedings of the 28th IEEE International Conference on Parallel and Distributed Systems, 2022

A Transformable NVMeoF Queue Design for Better Differentiating Read and Write Request Processing.
Proceedings of the 28th IEEE International Conference on Parallel and Distributed Systems, 2022

DNNEmu: A Lightweight Performance Emulator for Distributed DNN Training.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2022

LTNoT: Realizing the Trade-Offs Between Latency and Throughput in NVMe over TCP.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2022

THperf: Enabling Accurate Network Latency Measurement for Tianhe-2 System.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

ERA: ECN-Ratio-Based Congestion Control in Datacenter Networks.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

Reservoir: Enhance the Burst-flow Tolerance in Datacenter Networks.
Proceedings of the Tenth International Conference on Advanced Cloud and Big Data, 2022

2021
CIB-HIER: Centralized Input Buffer Design in Hierarchical High-radix Routers.
ACM Trans. Archit. Code Optim., 2021

Communication optimization strategies for distributed deep neural network training: A survey.
J. Parallel Distributed Comput., 2021

Harmonia: Explicit Congestion Notification and Credit-Reservation Transport Converged Congestion Control in Datacenters.
J. Comput. Sci. Technol., 2021

Performance Evaluation of Memory-Centric ARMv8 Many-Core Architectures: A Case Study with Phytium 2000+.
J. Comput. Sci. Technol., 2021

CCRP: Converging Credit-Based and Reactive Protocols in Datacenters.
Int. J. Parallel Program., 2021

MP-CREDIT: Multi-path credit for high-speed data center transports.
Comput. Networks, 2021

LIBSHALOM: optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores.
Proceedings of the International Conference for High Performance Computing, 2021

Taming Congestion and Latency in Low-Diameter High-Performance Datacenters.
Proceedings of the Network and Parallel Computing, 2021

MPICC: Multi-Path INT-Based Congestion Control in Datacenter Networks.
Proceedings of the Network and Parallel Computing, 2021

vSketchDLC: A Sketch on Distributed Deep Learning Communication via Fine-grained Tracing Visualization.
Proceedings of the Network and Parallel Computing, 2021

Evaluation of Topology-Aware All-Reduce Algorithm for Dragonfly Networks.
Proceedings of the Network and Parallel Computing, 2021

FastTune: Timely and Precise Congestion Control in Data Center Network.
Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

PAARD: Proximity-Aware All-Reduce Communication for Dragonfly Networks.
Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York City, NY, USA, September 30, 2021

FastHorovod: Expediting Parallel Message-Passing Schedule for Distributed DNN Training.
Proceedings of the IEEE Symposium on Computers and Communications, 2021

Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

MR-tree: A Parametric Family of Multi-Rail Fat-tree.
Proceedings of the IEEE International Performance, 2021

PFT: A Congestion Avoidance Method based on Proactive Flow Throttling at Endpoints.
Proceedings of the 17th IFIP/IEEE International Symposium on Integrated Network Management, 2021

CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Breaking One-RTT Barrier: Ultra-Precise and Efficient Congestion Control in Datacenter Networks.
Proceedings of the 30th International Conference on Computer Communications and Networks, 2021

NEPG: Partitioning Large-Scale Power-Law Graphs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

A Novel Reinforcement Learning Framework for Adaptive Routing in Network-on-Chips.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

Exploring Node Connection Modes in Multi-Rail Fat-tree.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

RELAR: A Reinforcement Learning Framework for Adaptive Routing in Network-on-Chips.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
Spatially Bursty I/O on Supercomputers: Causes, Impacts and Solutions.
IEEE Trans. Parallel Distributed Syst., 2020

OD-SGD: One-Step Delay Stochastic Gradient Descent for Distributed Training.
ACM Trans. Archit. Code Optim., 2020

DancerFly: An Order-Aware Network-on-Chip Router On-the-Fly Mitigating Multi-path Packet Reordering.
Int. J. Parallel Program., 2020

ssd-sgd: communication sparsification for distributed deep learning training.
CoRR, 2020

OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training.
CoRR, 2020

Communication Optimization Strategies for Distributed Deep Learning: A Survey.
CoRR, 2020

APCC: Agile and Precise Congestion Control in Datacenters.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

Bundlefly: a low-diameter topology for multicore fiber.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

FastCredit: Expediting Credit-based Proactive Transports in Datacenters.
Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems, 2020

Converging Credit-based and Reactive Datacenter Transport using ECN and RTT.
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020

Reducing Tail Latency in Proactive Congestion Control via Moderate Speculation.
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020

SSP: Speeding up Small Flows for Proactive Transport in Datacenters.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
SketchDLC: A Sketch on Distributed Deep Learning Communication via Trace Capturing.
ACM Trans. Archit. Code Optim., 2019

HARE: History-Aware Adaptive Routing Algorithm for Endpoint Congestion in Networks-on-Chip.
Int. J. Parallel Program., 2019

ExpressPass+: ECN-friendly Credit Reservation Congestion Control for Datacenters.
Proceedings of the ACM SIGCOMM 2019 Conference Posters and Demos, 2019

HyFabric: Minimizing FCT in Optical and Electrical Hybrid Data Center Networks.
Proceedings of the ACM SIGCOMM 2019 Conference Posters and Demos, 2019

ExpressPass++: Credit-Effecient Congestion Control for Data Centers.
Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019

Measuring the Coexistence Competitiveness of ECN- or RTT-Based ExpressPass and TCP in Data Centers.
Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019

DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture.
Proceedings of the ACM International Conference on Supercomputing, 2019

Network Congestion Avoidance through Packet-chaining Reservation.
Proceedings of the 48th International Conference on Parallel Processing, 2019

EC4: ECN and Credit-Reservation Converged Congestion Control.
Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems, 2019

PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2019

2018
RoB-Router : A Reorder Buffer Enabled Low Latency Network-on-Chip Router.
IEEE Trans. Parallel Distributed Syst., 2018

Congestion control in high-speed lossless data center networks: A survey.
Future Gener. Comput. Syst., 2018

DETOUR: A Large-Scale Non-blocking Optical Data Center Fabric.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

CRSP: Network Congestion Control through Credit Reservation.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

BFRP: Endpoint Congestion Avoidance Through Bilateral Flow Reservation.
Proceedings of the 37th IEEE International Performance Computing and Communications Conference, 2018

Eca-Router : On Achieving Endpoint Congestion Aware Switch Allocation in the On-Chip Network.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

2017
Energy-efficient NoC with multi-granularity power optimization.
J. Supercomput., 2017

HERO: A Hybrid Electrical and Optical Multicast for Accelerating High-Performance Data Center Applications.
Proceedings of the Posters and Demos Proceedings of the Conference of the ACM Special Interest Group on Data Communication, 2017

A Scalable and Resilient Microarchitecture Based on Multiport Binding for High-Radix Router Design.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

An Efficient Label Routing on High-Radix Interconnection Networks.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

iCAST: Accelerating High-Performance Data Center Applications by Hybrid Electrical and Optical Multicast.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Exploiting contention and congestion aware switch allocation in network-on-chips.
Proceedings of the ACM Turing 50th Celebration Conference, 2017

NoC power optimization using combined routing algorithms.
Proceedings of the 16th IEEE/ACIS International Conference on Computer and Information Science, 2017

2016
Detailed and clock-driven simulation for HPC interconnection network.
Frontiers Comput. Sci., 2016

Galaxyfly: A Novel Family of Flexible-Radix Low-Diameter Topologies for Large-Scales Interconnection Networks.
Proceedings of the 2016 International Conference on Supercomputing, 2016

CCAS: Contention and congestion aware switch allocation for network-on-chips.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

RoB-Router: Low Latency Network-on-Chip Router Microarchitecture Using Reorder Buffer.
Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016

MBL: A Multi-stage Bufferless High-radix Router.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015
High Performance Interconnect Network for Tianhe System.
J. Comput. Sci. Technol., 2015

FlyCast: Free-Space Optics Accelerating Multicast Communications in Physical Layer.
Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, 2015

Chameleon: Adaptive energy-efficient heterogeneous network-on-chip.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

HVCRouter: Energy Efficient Network-on-Chip Router with Heterogeneous Virtual Channels.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

2014
The TH Express high performance interconnect networks.
Frontiers Comput. Sci., 2014

PathZip: A lightweight scheme for tracing packet path in wireless sensor networks.
Comput. Networks, 2014

FLYER: Fine-grained landmark based greedy geographic routing under uncertain locations.
Proceedings of the IEEE International Conference on Communications, 2014

2013
Fine-Grained Location-Free Planarization in Wireless Sensor Networks.
IEEE Trans. Mob. Comput., 2013

Fine-Grained Landmark Based Greedy Geographic Routing with Guaranteed Delivery Under Uncertain Locations.
Proceedings of the IEEE 10th International Conference on Mobile Ad-Hoc and Sensor Systems, 2013

WormPlanar: Topological Planarization Based Wormhole Detection in Wireless Networks.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

2012
Distributed Coverage in Wireless Ad Hoc and Sensor Networks by Topological Graph Approaches.
IEEE Trans. Computers, 2012

MDS-Based Wormhole Detection Using Local Topology in Wireless Sensor Networks.
Int. J. Distributed Sens. Networks, 2012

PathZip: Packet path tracing in wireless sensor networks.
Proceedings of the 9th IEEE International Conference on Mobile Ad-Hoc and Sensor Systems, 2012

2011
Edge Self-Monitoring for Wireless Sensor Networks.
IEEE Trans. Parallel Distributed Syst., 2011

Component-based localization in sparse wireless networks.
IEEE/ACM Trans. Netw., 2011

Topological Detection on Wormholes in Wireless Ad Hoc and Sensor Networks.
IEEE/ACM Trans. Netw., 2011

Connectivity-Based Wormhole Detection in Ubiquitous Sensor Networks.
J. Inf. Sci. Eng., 2011

Fine-grained location-free planarization in wireless sensor networks.
Proceedings of the INFOCOM 2011. 30th IEEE International Conference on Computer Communications, 2011

2010
Distributed Coverage in Wireless Ad Hoc and Sensor Networks by Topological Graph Approaches.
Proceedings of the 2010 International Conference on Distributed Computing Systems, 2010

2009
Fine-grained boundary recognition in wireless ad hoc and sensor networks by topological methods.
Proceedings of the 10th ACM Interational Symposium on Mobile Ad Hoc Networking and Computing, 2009

WormCircle: Connectivity-Based Wormhole Detection in Wireless Ad Hoc and Sensor Networks.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

2008
Self-monitoring for sensor networks.
Proceedings of the 9th ACM Interational Symposium on Mobile Ad Hoc Networking and Computing, 2008

Component based localization in sparse wireless ad hoc and sensor networks.
Proceedings of the 16th annual IEEE International Conference on Network Protocols, 2008

2007
EETO: An Energy-Efficient Target-Oriented Clustering Protocol in Wireless Sensor Networks.
Proceedings of the Distributed Computing and Internet Technology, 2007

2006
Path Selection of Reliable Data Delivery in Wireless Sensor Networks.
Proceedings of the Wireless Algorithms, 2006


  Loading...