2025
No Rush in Executing Atomic Instructions.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
2024
SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
2023
Speculative inter-thread store-to-load forwarding in SMT architectures.
J. Parallel Distributed Comput., March, 2023
Cloud White: Detecting and Estimating QoS Degradation of Latency-Critical Workloads in the Public Cloud.
Future Gener. Comput. Syst., 2023
Rebasing Microarchitectural Research with Industry Traces.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023
CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023
Thread-to-Core Allocation in ARM Processors Building Synergistic Pairs.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023
2022
DeepP: Deep Learning Multi-Program Prefetch Configuration for the IBM POWER 8.
IEEE Trans. Computers, 2022
VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors.
IEEE Trans. Computers, 2022
The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture.
ACM Trans. Archit. Code Optim., 2022
Effect of Hyper-Threading in Latency-Critical Multithreaded Cloud Applications and Utilization Analysis of the Major System Resources.
Future Gener. Comput. Syst., 2022
A Neural Network to Estimate Isolated Performance from Multi-Program Execution.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022
2021
ITSLF: Inter-Thread Store-to-Load Forwardingin Simultaneous Multithreading.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
2020
Bandwidth-Aware Dynamic Prefetch Configuration for IBM POWER8.
IEEE Trans. Parallel Distributed Syst., 2020
Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors.
IEEE Trans. Parallel Distributed Syst., 2020
Understanding Cloud Workloads Performance in a Production like Environment.
CoRR, 2020
The Forward Slice Core Microarchitecture.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020
2019
Precise Runahead Execution.
IEEE Comput. Archit. Lett., 2019
2018
Designing lab sessions focusing on real processors for computer architecture courses: A practical perspective.
J. Parallel Distributed Comput., 2018
A Workload Generator for Evaluating SMT Real-Time Systems.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018
2017
Improving IBM POWER8 Performance Through Symbiotic Job Scheduling.
IEEE Trans. Parallel Distributed Syst., 2017
Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores.
IEEE Trans. Computers, 2017
2016
Bandwidth-Aware On-Line Scheduling in SMT Multicores.
IEEE Trans. Computers, 2016
Symbiotic job scheduling on the IBM POWER8.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
2015
Addressing Fairness in SMT Multicores with a Progress-Aware Scheduler.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
2014
Cache-Hierarchy Contention-Aware Scheduling in CMPs.
IEEE Trans. Parallel Distributed Syst., 2014
Addressing bandwidth contention in SMT multicores through scheduling.
Proceedings of the 2014 International Conference on Supercomputing, 2014
2013
Using Huge Pages and Performance Counters to Determine the LLC Architecture.
Proceedings of the International Conference on Computational Science, 2013
L1-bandwidth aware thread allocation in multicore SMT processors.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013
2012
Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012