Vela: A Virtualized LLM Training System with GPU Direct RoCE.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025
Low Latency and High Throughput Write-Ahead Logging Using CAPI-Flash.
IEEE Trans. Cloud Comput., 2021
CAPI-Flash Accelerated Persistent Read Cache for Apache Cassandra.
Proceedings of the 11th IEEE International Conference on Cloud Computing, 2018
Optimized Durable Commitlog for Apache Cassandra Using CAPI-Flash.
Proceedings of the 9th IEEE International Conference on Cloud Computing, 2016
Transactional memory support in the IBM POWER8 processor.
IBM J. Res. Dev., 2015
Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Eliminating global interpreter locks in ruby through hardware transactional memory.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Thread-level speculation on off-the-shelf hardware transactional memory.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014
Do C and Java programs scale differently on Hardware Transactional Memory?
Proceedings of the IEEE International Symposium on Workload Characterization, 2013
Continuous object access profiling and optimizations to overcome the memory wall and bloat.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012
Efficient runtime tracking of allocation sites in Java.
Proceedings of the 6th International Conference on Virtual Execution Environments, 2010
Real Java applications in software transactional memory.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010
Coloring-based coalescing for graph coloring register allocation.
Proceedings of the CGO 2010, 2010
Sentinel PRE: Hoisting beyond Exception Dependency with Dynamic Deoptimization.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005
Partial Value Number Redundancy Elimination.
Proceedings of the Languages and Compilers for High Performance Computing, 2004