2025
Beyond VABlock: Improving Transformer workloads through aggressive prefetching.
J. Syst. Archit., 2025
Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
2024
Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs.
IEEE Embed. Syst. Lett., December, 2024
Conflict-aware compiler for hierarchical register file on GPUs.
J. Syst. Archit., 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
SAVector: Vectored Systolic Arrays.
IEEE Access, 2024
VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing.
Proceedings of the 53rd International Conference on Parallel Processing, 2024
2023
Scale-out Systolic Arrays.
ACM Trans. Archit. Code Optim., June, 2023
FLIXR: Embedding Index Into Flash Translation Layer in SSDs.
IEEE Trans. Computers, 2023
MAD MAcce: Supporting Multiply-Add Operations for Democratizing Matrix-Multiplication Accelerators.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Imprecise Store Exceptions.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Warped-MC: An Efficient Memory Controller Scheme for Massively Parallel Processors.
Proceedings of the 52nd International Conference on Parallel Processing, 2023
SnakeByte: A TLB Design with Adaptive and Recursive Page Merging in GPUs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
AstriFlash A Flash-Based System for Online Services.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
2022
CASH-RF: A Compiler-Assisted Hierarchical Register File in GPUs.
IEEE Embed. Syst. Lett., 2022
Accuracy Boosters: Epoch-Driven Mixed-Mantissa Block Floating-Point for DNN Training.
CoRR, 2022
GhostLeg: Selective Memory Coalescing for Secure GPU Architecture.
IEEE Access, 2022
Analyzing GCN Aggregation on GPU.
IEEE Access, 2022
TEA-RC: Thread Context-Aware Register Cache for GPUs.
IEEE Access, 2022
2021
Rebooting Virtual Memory with Midgard.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
2020
Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
2019
Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs.
IEEE Trans. Computers, 2019
Linebacker: preserving victim cache lines in idle register files of GPUs.
Proceedings of the 46th International Symposium on Computer Architecture, 2019
2018
WASP: Selective Data Prefetching with Monitoring Runtime Warp Progress on GPUs.
IEEE Trans. Computers, 2018
FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
2017
Dynamic Resizing on Active Warps Scheduler to Hide Operation Stalls on GPUs.
IEEE Trans. Parallel Distributed Syst., 2017
Access Pattern-Aware Cache Management for Improving Data Utilization in GPU.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
2016
APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
2015
DRAW: investigating benefits of adaptive fetch group size on GPU.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015
2013
GPU-Friendly Parallel Genome Matching with Tiled Access and Reduced State Transition Table.
Int. J. Parallel Program., 2013
2010
Hardware implementation of a tessellation accelerator for the OpenVG standard.
IEICE Electron. Express, 2010