Weikuan Yu

Orcid: 0000-0002-8754-0311

According to our database1, Weikuan Yu authored at least 111 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
DFTracer: An Analysis-Friendly Data Flow Tracer for AI-Driven Workflows.
Proceedings of the International Conference for High Performance Computing, 2024

ESSA 2024 Message and Committees.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Understanding Highly Configurable Storage for Diverse Workloads.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
I/O characterization and performance evaluation of large-scale storage architectures for heterogeneous workloads.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
PhaST: Hierarchical Concurrent Log-Free Skip List for Persistent Memory.
IEEE Trans. Parallel Distributed Syst., 2022

Accurate classification of depression through optimized machine learning models on high-dimensional noisy data.
Biomed. Signal Process. Control., 2022

DFMan: A Graph-based Optimization of Dataflow Scheduling on High-Performance Computing Systems.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

SVAGC: Garbage Collection with a Scalable Virtual Address Swapping Technique.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

O(1) Communication for Distributed SGD through Two-Level Gradient Averaging.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Compression of Time Evolutionary Image Data through Predictive Deep Neural Networks.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
Ad Hoc File Systems for High-Performance Computing.
J. Comput. Sci. Technol., 2020

O(1) Communication for Distributed SGD through Two-Level Gradient Averaging.
CoRR, 2020

Emulating I/O Behavior in Scientific Workflows on High Performance Computing Systems.
Proceedings of the Fifth IEEE/ACM International Parallel Data Systems Workshop, 2020

Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

2019
Multivariate modeling and two-level scheduling of analytic queries.
Parallel Comput., 2019

Enhancing MapReduce Fault Recovery Through Binocular Speculation.
CoRR, 2019

Exploration of memory hybridization for RDD caching in Spark.
Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, 2019

I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Efficient User-Level Storage Disaggregation for Deep Learning.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018
HALO: a fast and durable disk write cache using phase change memory.
Clust. Comput., 2018

An Initial Implementation of Libfabric Conduit for OpenSHMEM-X.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Semantics-Aware Prediction for Analytic Queries in MapReduce Environment.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support.
Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers, 2018

SHMEMGraph: Efficient and Balanced Graph Processing Using One-Sided Communication.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
A case study of tuning MapReduce for efficient Bioinformatics in the cloud.
Parallel Comput., 2017

FARMS: Efficient mapreduce speculation for failure recovery in short jobs.
Parallel Comput., 2017

Challenges and Opportunities of User-Level File Systems for HPC (Dagstuhl Seminar 17202).
Dagstuhl Reports, 2017

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

High-Performance Key-Value Store On OpenSHMEM.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers.
IEEE Trans. Parallel Distributed Syst., 2016

Enhance parallel input/output with cross-bundle aggregation.
Int. J. High Perform. Comput. Appl., 2016

An ephemeral burst-buffer file system for scientific applications.
Proceedings of the International Conference for High Performance Computing, 2016

DISP: Optimizations towards Scalable MPI Startup.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

SHMemCache: Enabling Memcached on the OpenSHMEM Global Address Model.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

OAWS: Memory Occlusion Aware Warp Scheduling.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Virtual Shuffling for Efficient Data Movement in MapReduce.
IEEE Trans. Computers, 2015

Development of a Burst Buffer System for Data-Intensive Applications.
CoRR, 2015

Performance evaluation and tuning of BioPig for genomic analysis.
Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, 2015

A case study of MapReduce speculation for failure recovery.
Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, 2015

SFMapReduce: An optimized MapReduce framework for Small Files.
Proceedings of the 10th IEEE International Conference on Networking, 2015

Preserving Row Buffer Locality for PCM Wear-Leveling under Massive Parallelism.
Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

Cracking Down MapReduce Failure Amplification through Analytics Logging and Migration.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

DaCache: Memory Divergence-Aware GPU Cache Management.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Eliminating intra-warp conflict misses in GPU.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

TRIO: Burst Buffer Based I/O Orchestration.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration.
IEEE Trans. Parallel Distributed Syst., 2014

Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks.
Concurr. Comput. Pract. Exp., 2014

neCODEC: nearline data compression for scientific applications.
Clust. Comput., 2014

Non-work-conserving effects in MapReduce: diffusion limit and criticality.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

BPAR: a bundle-based parallel aggregation framework for decoupled I/O execution.
Proceedings of the 2014 International Workshop on Data Intensive Scalable Computing Systems, 2014

Characterization and Optimization of Memory-Resident MapReduce on HPC Systems.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

BurstMem: A high-performance burst buffer system for scientific applications.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

2013
CooMR: cross-task coordination for efficient data management in MapReduce programs.
Proceedings of the International Conference for High Performance Computing, 2013

DynaM: Dynamic Multiresolution Data Representation for Large-Scale Scientific Analysis.
Proceedings of the IEEE Eighth International Conference on Networking, 2013

A lightweight I/O scheme to facilitate spatial and temporal queries of scientific data analytics.
Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies, 2013

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory.
Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

Performance and Power Simulation for Versatile GPGPU Global Memory.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

JVM-Bypass for Efficient Hadoop Shuffling.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application.
Proceedings of the 22nd International Conference on Computer Communication and Networks, 2013

Preemptive ReduceTask Scheduling for Fair and Fast Job Completion.
Proceedings of the 10th International Conference on Autonomic Computing, 2013

A case of system-wide power management for scientific applications.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems.
J. Parallel Distributed Comput., 2012

Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing.
Int. J. Parallel Program., 2012

RXIO: Design and implementation of high performance RDMA-capable GridFTP.
Comput. Electr. Eng., 2012

Assessing the Performance Impact of High-Speed Interconnects on MapReduce.
Proceedings of the Specifying Big Data Benchmarks, 2012

Classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

SMART-IO: SysteM-AwaRe Two-Level Data Organization for Efficient Scientific Analytics.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

PCM-Based Durable Write Cache for Fast Disk I/O.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

A system-aware optimized data organization for efficient scientific analytics.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

2011
Hadoop acceleration through network levitated merge.
Proceedings of the Conference on High Performance Computing Networking, 2011

Virtual Topologies for Scalable Resource Management and Contention Attenuation in a Global Address Space Model on the Cray XT5.
Proceedings of the International Conference on Parallel Processing, 2011

MIND: A black-box energy consumption model for disk arrays.
Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

BMF: Bitmapped Mass Fingerprinting for Fast Protein Identification.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems.
Comput. Sci. Res. Dev., 2010

Initial characterization of parallel NFS implementations.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Efficient Zero-Copy Noncontiguous I/O for Globus on InfiniBand.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Enabling a highly-scalable global address space model for petascale computing.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Design, implementation, and evaluation of transparent pNFS on Lustre.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008
Wide-area performance profiling of 10GigE and InfiniBand technologies.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Early evaluation of IBM BlueGene/P.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Experimental Analysis of InfiniBand Transport Services on WAN.
Proceedings of The 2008 IEEE International Conference on Networking, 2008

Performance characterization and optimization of parallel I/O on the Cray XT.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

ParColl: Partitioned Collective I/O on the Cray XT.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

HPC Interconnection Networks: The Key to Exascale Computing.
Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008

Empirical Analysis of a Large-Scale Hierarchical Storage System.
Proceedings of the Euro-Par 2008, 2008

Xen-Based HPC: A Parallel I/O Perspective.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
FlexFetch: A History-Aware Scheme for I/O Energy Saving in Mobile Computing.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Exploiting Lustre File Joining for Effective Collective IO.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006
Benefits of high speed interconnects to cluster file systems: a case study with Lustre.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Adaptive connection management for scalable MPI over InfiniBand.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

2005
High Performance Broadcast Support in La-Mpi Over Quadrics.
Int. J. High Perform. Comput. Appl., 2005

Design and Implementation of Open MPI over Quadrics/Elan4.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

High performance support of parallel virtual file system (PVFS2) over Quadrics.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

2004
Microbenchmark Performance Comparison of High-Speed Cluster Interconnects.
IEEE Micro, 2004

Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Fast and Scalable Startup of MPI Programs in InfiniBand Clusters.
Proceedings of the High Performance Computing, 2004

Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

High Performance and Reliable NIC-Based Multicast over Myrinet/GM-2.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Micro-benchmark level performance comparison of high-speed cluster interconnects.
Proceedings of the 11th Annual IEEE Symposium on High Performance Interconnects, 2003


  Loading...