2024
Enhanced UGAL Routing Schemes for Dragonfly Networks.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Evaluation of an LLM-Powered Student Agent for Teacher Training.
Proceedings of the Technology Enhanced Learning for Inclusive and Equitable Quality Education, 2024
Designing Conversational Agents to Support Student Teacher Learning in Virtual Reality Simulation: A Case Study.
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024
2022
Faster Yet Safer: Logging System Via Fixed-Key Blockcipher.
IACR Cryptol. ePrint Arch., 2022
Experience with Integrating Computer Science in Middle School Mathematics.
Proceedings of the ITiCSE 2022: Innovation and Technology in Computer Science Education, Dublin, Ireland, July 8, 2022
2021
The domestic computer science graduate students are there, we just need to recruit them.
Commun. ACM, 2021
Efficient Algorithms for Encrypted All-gather Operation.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Multi-Path Routing in the Jellyfish Network.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
Encrypted All-reduce on Multi-core Clusters.
Proceedings of the IEEE International Performance, 2021
A Simulation Study of Hardware Parameters for Future GPU-based HPC Platforms.
Proceedings of the IEEE International Performance, 2021
2020
Multi-Path Routing on the Jellyfish Networks.
CoRR, 2020
CryptMPI: A Fast Encrypted MPI Library.
CoRR, 2020
Performance Evaluation and Modeling of Cryptographic Libraries for MPI Communications.
CoRR, 2020
Global link arrangement for practical Dragonfly.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020
2019
Modeling Universal Globally Adaptive Load-Balanced Routing.
ACM Trans. Parallel Comput., 2019
Topology-custom UGAL routing on dragonfly.
Proceedings of the International Conference for High Performance Computing, 2019
An Empirical Study of Cryptographic Libraries for MPI Communications.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019
2018
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks.
IEEE Trans. Parallel Distributed Syst., 2018
Random Regular Graph and Generalized De Bruijn Graph with k-Shortest Path Routing.
IEEE Trans. Parallel Distributed Syst., 2018
TPR: Traffic Pattern-Based Adaptive Routing for Dragonfly Networks.
IEEE Trans. Multi Scale Comput. Syst., 2018
Fast classification of MPI applications using Lamport's logical clocks.
J. Parallel Distributed Comput., 2018
Performance and Accuracy Trade-offs of HPC Application Modeling and Simulation.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
Load-Balanced Slim Fly Networks.
Proceedings of the 47th International Conference on Parallel Processing, 2018
A Comparative Study of Topology Design Approaches for HPC Interconnects.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018
2017
Modeling UGAL on the Dragonfly Topology.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2017
A comparative study of SDN and adaptive routing on dragonfly networks.
Proceedings of the International Conference for High Performance Computing, 2017
Throughput Models of Interconnection Networks: The Good, the Bad, and the Ugly.
Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017
2016
On Folded-Clos Networks with Deterministic Single-Path Routing.
ACM Trans. Parallel Comput., 2016
Enhancing infiniband with openflow-style SDN capability.
Proceedings of the International Conference for High Performance Computing, 2016
Random Regular Graph and Generalized De Bruijn Graph with k-Shortest Path Routing.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Traffic Pattern-Based Adaptive Routing for Intra-Group Communication in Dragonfly Networks.
Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016
2015
Fast Calculation of Max-Min Fair Rates for Multi-commodity Flows in Fat-Tree Networks.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
2014
Static load-balanced routing for slimmed fat-trees.
J. Parallel Distributed Comput., 2014
LFTI: A New Performance Metric for Assessing Interconnect Designs for Extreme-Scale HPC Systems.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
2013
Protocol Customization for Improving MPI Performance on RDMA-Enabled Clusters.
Int. J. Parallel Program., 2013
A new routing scheme for Jellyfish and its performance with HPC workloads.
Proceedings of the International Conference for High Performance Computing, 2013
Trusted Group Key Management for Real-Time Critical Infrastructure Protection.
Proceedings of the 32th IEEE Military Communications Conference, 2013
Towards a secure electricity grid.
Proceedings of the 2013 IEEE Eighth International Conference on Intelligent Sensors, 2013
RRR: A Load Balanced Routing Scheme for Slimmed Fat-Trees.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
A comparative study of high-performance computing on the cloud.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013
A new design of RDMA-based small message channels for InfiniBand clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013
2012
Guest Editor's Note - Interaction between Compilers and Computer Architectures.
J. Circuits Syst. Comput., 2012
A cyber-physical approach to a wide-area actionable system for the power grid.
Proceedings of the 31st IEEE Military Communications Conference, 2012
Limited Multi-path Routing on Extended Generalized Fat-trees.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
A Trusted Computing Architecture for Secure Substation Automation.
Proceedings of the Critical Information Infrastructures Security, 2012
2011
An empirical study of behavioral characteristics of spammers: Findings and implications.
Comput. Commun., 2011
On Nonblocking Folded-Clos Networks in Computer Communication Environments.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Improving Performance of Deterministic Single-Path Routing on 2-Level Generalized Fat-Trees.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Profile Guided MPI Protocol Selection for Point-to-Point Communication Calls.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
2010
Processor affinity and MPI performance on SMP-CMP clusters.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Near-Optimal Rendezvous Protocols for RDMA-Enabled Clusters.
Proceedings of the 39th International Conference on Parallel Processing, 2010
2009
LID Assignment in InfiniBand Networks.
IEEE Trans. Parallel Distributed Syst., 2009
Oblivious routing in fat-tree based system area networks with uncertain traffic demands.
IEEE/ACM Trans. Netw., 2009
Fair Round-Robin: A Low Complexity Packet Schduler with Proportional and Worst-Case Fairness.
IEEE Trans. Computers, 2009
Bandwidth optimal all-reduce algorithms for clusters of workstations.
J. Parallel Distributed Comput., 2009
Maximizing MPI point-to-point communication performance on RDMA-enabled clusters with customized protocols.
Proceedings of the 23rd international conference on Supercomputing, 2009
2008
Controlling IP Spoofing through Interdomain Packet Filters.
IEEE Trans. Dependable Secur. Comput., 2008
Techniques for pipelined broadcast on ethernet switched clusters.
J. Parallel Distributed Comput., 2008
A Study of Process Arrival Patterns for MPI Collective Operations.
Int. J. Parallel Program., 2008
Bandwidth Efficient All-to-All Broadcast on Switched Clusters.
Int. J. Parallel Program., 2008
Efficient MPI Bcast across different process arrival patterns.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
An MPI tool for automatically discovering the switch level topologies of Ethernet clusters.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Traffic-Aware Inter-Domain Routing for Improved Internet Routing Stability.
Proceedings of the Global Communications Conference, 2008. GLOBECOM 2008, New Orleans, LA, USA, 30 November, 2008
2007
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters.
IEEE Trans. Parallel Distributed Syst., 2007
An empirical study of reliable multicast protocols over Ethernet-connected networks.
Perform. Evaluation, 2007
On QoS routing and path establishment in the presence of imprecise state information.
J. Commun. Networks, 2007
Bandwidth Efficient All-reduce Operation on Tree Topologies.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Behavioral Characteristics of Spammers and Their Network Reachability Properties.
Proceedings of IEEE International Conference on Communications, 2007
2006
VISTA: VPO interactive system for tuning applications.
,
,
,
,
,
,
,
,
,
,
,
,
,
ACM Trans. Embed. Comput. Syst., 2006
Poster reception - A study of process arrival patterns for MPI collective operations.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Pipelined broadcast on Ethernet switched clusters.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
STAR-MPI: self tuned adaptive routines for MPI collective operations.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006
2005
Branch elimination by condition merging.
Softw. Pract. Exp., 2005
An MPI prototype for compiled communication on Ethernet switched clusters.
J. Parallel Distributed Comput., 2005
Message Scheduling for All-to-All Personalized Communication on Ethernet Switched Clusters.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Automatic generation and tuning of MPI collective communication routines.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005
An Empirical Approach for Efficient All-to-All Personalized Communication on Ethernet Switched Clusters.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005
2004
Automatic validation of code-improving transformations on low-level program representations.
Sci. Comput. Program., 2004
Message from the Chairs: International Workshop on Network Design and Architecture.
Proceedings of the 33rd International Conference on Parallel Processing Workshops (ICPP 2004 Workshops), 2004
2003
Algorithms for Supporting Compiled Communication.
IEEE Trans. Parallel Distributed Syst., 2003
Wavelength Assignment to Minimize the Number of SONET ADMs in WDM Rings.
Photonic Netw. Commun., 2003
Validation of Code-Improving Transformations for Embedded Systems.
Proceedings of the 2003 ACM Symposium on Applied Computing (SAC), 2003
CC-MPI: a compiled communication capable MPI prototype for ethernet switched clusters.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003
Empirical probability based QoS routing.
Proceedings of IEEE International Conference on Communications, 2003
Branch Elimination via Multi-variable Condition Merging.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003
2002
Heuristic algorithms for multiconstrained quality-of-service routing.
IEEE/ACM Trans. Netw., 2002
VISTA: a system for interactive code improvement.
Proceedings of the 2002 Joint Conference on Languages, 2002
Group Management Schemes for Implementing MPI collective Communication over IP-Multicast.
Proceedings of the 6th Joint Conference on Information Science, 2002
A Study of Dynamic Routing and Wavelength Assignment with Imprecise Network State Information.
Proceedings of the 31st International Conference on Parallel Processing Workshops (ICPP 2002 Workshops), 2002
Message from the Co-Chairs.
Proceedings of the 31st International Conference on Parallel Processing Workshops (ICPP 2002 Workshops), 2002
A comparative study of QoS routing schemes that tolerate imprecise state information.
Proceedings of the 11th International Conference on Computer Communications and Networks, 2002
2001
Performance of Multi-hop Communications Using Logical Topologies on Optical Torus Networks.
J. Parallel Distributed Comput., 2001
Using a Swap Instruction to Coalesce Loads and Stores.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001
2000
Automatic Validation of Code-Improving Transformations.
Proceedings of the Languages, 2000
1999
Distributed Path Reservation Algorithms for Multiplexed All-Optical Interconnection Networks.
IEEE Trans. Computers, 1999
Distributed Control Protocols for Wavelength Reservation and their Performance Evaluation.
Photonic Netw. Commun., 1999
Compiler Analysis to Support Compiled Communication for HPF-Like Programs.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999
1998
Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks.
Proceedings of the International Conference On Computer Communications and Networks (ICCCN 1998), 1998
1997
Demand-Driven Data Flow Analysis for Communication Optimization.
Parallel Process. Lett., 1997
A Load Balancing Package on Distributed Memory Systems and its Application to Particle-Particle Particle-Mesh (P3M) Methods.
Parallel Comput., 1997
Does Time-Division Multiplexing Close the Gap between Memory and Optical Communication Speeds?
Proceedings of the Parallel Computer Routing and Communication, 1997
An Array Data Flow Analysis Based Communication Optimizer.
Proceedings of the Languages and Compilers for Parallel Computing, 1997
1996
Compiled Communication for All-Optical TDM Networks.
Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996
A Timestamp-based Selective Invalidation Scheme for Multiprocessor Cache Coherence.
Proceedings of the 1996 International Conference on Parallel Processing, 1996
A Load Balancing Package for Domain Decomposition on Distributed Memory Systems.
Proceedings of the High-Performance Computing and Networking, 1996