Shuai Che
According to our database1,
Shuai Che
authored at least 33 papers
between 2008 and 2023.
Collaborative distances:
Collaborative distances:
Timeline
2008
2010
2012
2014
2016
2018
2020
2022
0
1
2
3
4
5
6
7
1
1
1
2
1
1
1
2
1
4
4
2
5
2
2
1
1
1
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023
2022
IEEE Trans. Emerg. Top. Comput., 2022
2021
Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture.
IEEE Trans. Computers, 2021
2020
Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
2019
Software-Defined Design Space Exploration for an Efficient AI Accelerator Architecture.
CoRR, 2019
Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM.
CoRR, 2019
Northup: Divide-and-Conquer Programming in Systems with Heterogeneous Memories and Processors.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
2017
Int. J. Parallel Program., 2017
Proceedings of the International Conference for High Performance Computing, 2017
Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Work Stealing in a Shared Virtual-Memory Heterogeneous Environment: A Case Study with Betweenness Centrality.
Proceedings of the Computing Frontiers Conference, 2017
Proceedings of the 24th IEEE Symposium on Computer Arithmetic, 2017
2016
Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016
Challenges of Programming a System with Heterogeneous Memories and Heterogeneous Processors: A Programmer's View.
Proceedings of the Second International Symposium on Memory Systems, 2016
Proceedings of the Second International Symposium on Memory Systems, 2016
Proceedings of the ACM Workshop on High Performance Graph Processing, 2016
2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015
2014
Int. J. High Perform. Comput. Appl., 2014
SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014
Dymaxion++: A Directive-Based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the IEEE High Performance Extreme Computing Conference, 2014
Proceedings of the IEEE High Performance Extreme Computing Conference, 2014
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014
2013
Proceedings of the IEEE International Symposium on Workload Characterization, 2013
Load balancing in a changing world: dealing with heterogeneity and performance variability.
Proceedings of the Computing Frontiers Conference, 2013
2011
Proceedings of the Conference on High Performance Computing Networking, 2011
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011
2010
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010
2009
Proceedings of the 2009 IEEE International Symposium on Workload Characterization, 2009
2008
A performance study of general-purpose applications on graphics processors using CUDA.
J. Parallel Distributed Comput., 2008
Proceedings of the IEEE Symposium on Application Specific Processors, 2008