2025
Semaphores Augmented with a Waiting Array.
CoRR, January, 2025
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025
2024
Exploring Time-Space trade-offs for synchronized in Lilliput.
CoRR, 2024
2022
FedPerm: Private and Robust Federated Learning by Parameter Permutation.
CoRR, 2022
2021
Intra-process Caching and Reuse of Threads.
CoRR, 2021
Ready When You Are: Efficient Condition Variables via Delegated Condition Evaluation.
CoRR, 2021
Optimizing Inference Performance of Transformers on CPUs.
CoRR, 2021
Hemlock: Compact and Scalable Mutual Exclusion.
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021
2020
Proceedings of the Networked Systems - 8th International Conference, 2020
Scalable range locks for scalable address spaces and beyond.
Proceedings of the EuroSys '20: Fifteenth EuroSys Conference 2020, 2020
2019
BRAVO - Biased Locking for Reader-Writer Locks.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019
Compact NUMA-aware Locks.
Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, 2019
Avoiding Scalability Collapse by Restricting Concurrency.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019
TWA - Ticket Locks Augmented with a Waiting Array.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019
2018
Improving Parallelism in Hardware Transactional Memory.
ACM Trans. Archit. Code Optim., 2018
Persistent Memory Transactions.
,
,
,
,
,
,
,
,
,
,
CoRR, 2018
Fast mutual exclusion by the Triangle algorithm.
Concurr. Comput. Pract. Exp., 2018
High-contention mutual exclusion by elevator algorithms.
Concurr. Comput. Pract. Exp., 2018
2017
Towards an Efficient Pauseless Java GC with Selective HTM-Based Access Barriers.
Proceedings of the 14th International Conference on Managed Languages and Runtimes, 2017
Proceedings of the Twelfth European Conference on Computer Systems, 2017
2016
Dekker's mutual exclusion algorithm made RW-safe.
Concurr. Comput. Pract. Exp., 2016
Refined transactional lock elision.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Transactional Pointers: Experiences with HTM-Based Reference Counting in C++.
Proceedings of the Networked Systems - 4th International Conference, 2016
Fast non-intrusive memory reclamation for highly-concurrent data structures.
Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management, Santa Barbara, CA, USA, June 14, 2016
2015
Lock Cohorting: A General Technique for Designing NUMA Locks.
ACM Trans. Parallel Comput., 2015
The Influence of Malloc Placement on TSX Hardware Transactional Memory.
CoRR, 2015
High-performance <i>N</i>-thread software solutions for mutual exclusion.
Concurr. Comput. Pract. Exp., 2015
Evaluating HTM for Pauseless Garbage Collectors in Java.
Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015
The TURBO Diaries: Application-controlled Frequency Scaling Explained.
Proceedings of the Software Engineering & Management 2015, Multikonferenz der GI-Fachbereiche Softwaretechnik (SWT) und Wirtschaftsinformatik (WI), FA WI-MAW, 17. März, 2015
2014
Hardware extensions to make lazy subscription safe.
CoRR, 2014
Software-based contention management for efficient compare-and-swap operations.
Concurr. Comput. Pract. Exp., 2014
Brief announcement: persistent unfairness arising from cache residency imbalance.
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014
Adaptive integration of hardware and software lock elision techniques.
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014
2013
Scalable statistics counters.
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013
Using hardware transactional memory to correct and simplify and readers-writer lock algorithm.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013
NUMA-aware reader-writer locks.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013
Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores.
Proceedings of the Principles of Distributed Systems - 17th International Conference, 2013
Lightweight Contention Management for Efficient Compare-and-Swap Operations.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013
2011
Brief announcement: multilane - a concurrent blocking multiset.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011
Flat-combining NUMA locks.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011
Brief announcement: a partitioned ticket lock.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011
Cache index-aware memory allocation.
Proceedings of the 10th International Symposium on Memory Management, 2011
2010
TLRW: return of the read-write lock.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010
Simplifying concurrent algorithms by exploiting hardware transactional memory.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010
Efficient Lock Free Privatization.
Proceedings of the Principles of Distributed Systems - 14th International Conference, 2010
Transactional Mutex Locks.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010
2009
Early experience with a commercial hardware transactional memory implementation.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009
2007
Potential show-stoppers for transactional synchronization.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
Understanding Tradeoffs in Software Transactional Memory.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007
2006
Transactional Locking II.
Proceedings of the Distributed Computing, 20th International Symposium, 2006
2005
Supporting per-processor local-allocation buffers using lightweight user-level preemption notification.
Proceedings of the 1st International Conference on Virtual Execution Environments, 2005
2002
Proceedings of The Workshop on Memory Systems Performance (MSP 2002), 2002
2001
Implementing Fast Java Monitors with Relaxed-Locks.
Proceedings of the 1st Java Virtual Machine Research and Technology Symposium, 2001