2025
TLP Balancer: Predictive Thread Allocation for Multitenant Inference in Embedded GPUs.
IEEE Embed. Syst. Lett., June, 2025
Beyond VABlock: Improving Transformer workloads through aggressive prefetching.
J. Syst. Archit., 2025
SSFFT: Energy-Efficient Selective Scaling for Fast Fourier Transform in Embedded GPUs.
Proceedings of the 26th ACM SIGPLAN/SIGBED International Conference on Languages, 2025
HyMM: A Hybrid Sparse-Dense Matrix Multiplication Accelerator for GCNs.
Proceedings of the Design, Automation & Test in Europe Conference, 2025
2024
Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs.
IEEE Embed. Syst. Lett., December, 2024
Conflict-aware compiler for hierarchical register file on GPUs.
J. Syst. Archit., 2024
SAVector: Vectored Systolic Arrays.
IEEE Access, 2024
Coldmap: Extending SSD Lifetime Exploiting Multi-Page Mapping Information.
Proceedings of the 13th Non-Volatile Memory Systems and Applications Symposium, 2024
VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing.
Proceedings of the 53rd International Conference on Parallel Processing, 2024
2023
FLIXR: Embedding Index Into Flash Translation Layer in SSDs.
IEEE Trans. Computers, 2023
Vizard: Passing Over Profiling-Based Detection by Manipulating Performance Counters.
IEEE Access, 2023
Warped-MC: An Efficient Memory Controller Scheme for Massively Parallel Processors.
Proceedings of the 52nd International Conference on Parallel Processing, 2023
2022
GhostLeg: Selective Memory Coalescing for Secure GPU Architecture.
IEEE Access, 2022
Analyzing GCN Aggregation on GPU.
IEEE Access, 2022
Restore Buffer Overflow Attacks: Breaking Undo-Based Defense Schemes.
Proceedings of the International Conference on Information Networking, 2022
CacheRewinder: Revoking Speculative Cache Updates Exploiting Write-Back Buffer.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
Stealth ECC: A Data-Width Aware Adaptive ECC Scheme for DRAM Error Resilience.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
2020
Hi-End: Hierarchical, Endurance-Aware STT-MRAM-Based Register File for Energy-Efficient GPUs.
IEEE Access, 2020
2019
Linebacker: preserving victim cache lines in idle register files of GPUs.
Proceedings of the 46th International Symposium on Computer Architecture, 2019
GraphSSD: graph semantics aware SSD.
Proceedings of the 46th International Symposium on Computer Architecture, 2019
2018
CTA-Aware Prefetching and Scheduling for GPU.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
2017
Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution.
IEEE Trans. Computers, 2017
Summarizer: trading communication with computing near storage.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Access Pattern-Aware Cache Management for Improving Data Utilization in GPU.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
2016
Warped-preexecution: A GPU pre-execution approach for improving latency hiding.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
2015
Warped-compression: enabling power efficient GPUs through register compression.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Revealing Critical Loads and Hidden Data Locality in GPGPU Applications.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015
2006
A robust PRML read channel with digital timing recovery for multi-format optical disc.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006