Xianwei Zhang

Orcid: 0000-0003-3507-4299

Affiliations:
  • Sun Yat-sen University, School of Computer Science and Engineering, Guangzhou, China
  • AMD Inc., Sunnyvale, CA, USA
  • University of Pittsburgh, Computer Science Department, Pittsburgh, PA, USA


According to our database1, Xianwei Zhang authored at least 27 papers between 2013 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2014
2016
2018
2020
2022
2024
0
1
2
3
4
5
6
1
1
1
1
5
2
3
1
3
1
1
2
4
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Mpache: Interaction Aware Multi-level Cache Bypassing on GPUs.
Proceedings of the 30th Asia and South Pacific Design Automation Conference, 2025

2024
APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes.
Proceedings of the International Conference for High Performance Computing, 2024

mLOOP: Optimize Loop Unrolling in Compilation with a ML-based Approach.
Proceedings of the International Conference on Networking, Architecture and Storage, 2024

MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores.
Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, 2024

openLG: A Tunable and Efficient Open-source LSTM on GPUs.
Proceedings of the International Joint Conference on Neural Networks, 2024

SMILE: LLC-based Shared Memory Expansion to Improve GPU Thread Level Parallelism.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023
Hybrid MPI and CUDA paralleled finite volume unstructured CFD simulations on a multi-GPU system.
Future Gener. Comput. Syst., 2023

Hay: Enhancing GPU Sharing Performance With Two-Level Scheduling for Ray.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

2022
RollBin: reducing code-size via loop rerolling at binary level.
Proceedings of the LCTES '22: 23rd ACM SIGPLAN/SIGBED International Conference on Languages, 2022

moTuner: a compiler-based auto-tuning approach for mixed-precision operators.
Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

RAISE: Efficient GPU Resource Management via Hybrid Scheduling.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2020
DELTA: Validate GPU Memory Profiling with Microbenchmarks.
Proceedings of the MEMSYS 2020: The International Symposium on Memory Systems, 2020

2019
Optimizing GPU Cache Policies for MI Workloads.
CoRR, 2019

Autonomous Data-Race-Free GPU Testing.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Optimizing GPU Cache Policies for MI Workloads.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Boosting chipkill capability under retention-error induced reliability emergency.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

2018
Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
On the Restore Time Variations of Future DRAM Memory.
ACM Trans. Design Autom. Electr. Syst., 2017

DrMP: Mixed Precision-Aware DRAM for High Performance Approximate and Precise Computing.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
AWARD: Approximation-aWAre Restore in Further Scaling DRAM.
Proceedings of the Second International Symposium on Memory Systems, 2016

Restore truncation for performance improvement in future DRAM systems.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015
Exploit common source-line to construct energy efficient domain wall memory based caches.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

TriState-SET: Proactive SET for improved performance of MLC phase change memories.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

DLB: Dynamic lane borrowing for improving bandwidth and performance in Hybrid Memory Cube.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Exploiting DRAM restore time variations in deep sub-micron scaling.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2013
WoM-SET: Low power proactive-SET-based PCM write using WoM code.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013


  Loading...