Shuaiwen Song
Orcid: 0000-0002-8402-1436Affiliations:
- University of Sydney, Australia
According to our database1,
Shuaiwen Song
authored at least 117 papers
between 2009 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
IEEE Trans. Parallel Distributed Syst., August, 2024
<i>TEA+</i>: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture.
ACM Trans. Archit. Code Optim., June, 2024
Efficient Radius Search for Adaptive Foveal Sizing Mechanism in Collaborative Foveated Rendering Framework.
IEEE Trans. Mob. Comput., May, 2024
MalFox: Camouflaged Adversarial Malware Example Generation Based on Conv-GANs Against Black-Box Detectors.
IEEE Trans. Computers, April, 2024
CoRR, 2024
CoRR, 2024
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
CoRR, 2024
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing, 2024
MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
2023
Enabling High-Efficient ReRAM-Based CNN Training Via Exploiting Crossbar-Level Insignificant Writing Elimination.
IEEE Trans. Computers, November, 2023
Data Fusion in Infrastructure-Augmented Autonomous Driving System: Why? Where? and How?
IEEE Internet Things J., September, 2023
Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity.
Proc. VLDB Endow., 2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
CoRR, 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity.
CoRR, 2023
CoRR, 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023
Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models.
CoRR, 2023
Mitigating Coupling Map Constrained Correlated Measurement Errors on Quantum Devices.
Proceedings of the International Conference for High Performance Computing, 2023
NAS-SE: Designing A Highly-Efficient In-Situ Neural Architecture Search Engine for Large-Scale Deployment.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023
Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
Proceedings of the Eighteenth European Conference on Computer Systems, 2023
G-Sparse: Compiler-Driven Acceleration for Generalized Sparse Computation for Graph Neural Networks on Modern GPUs.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023
2022
IEEE Trans. Parallel Distributed Syst., 2022
DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata Processor.
ACM Trans. Archit. Code Optim., 2022
Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System.
Proceedings of the 28th IEEE Real-Time and Embedded Technology and Applications Symposium, 2022
Vapro: performance variance detection and diagnosis for production-run parallel applications.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022
Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator systems.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022
AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022
T-GCN: A Sampling Based Streaming Graph Neural Network System with Hybrid Architecture.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
2021
Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design.
IEEE Trans. Computers, 2021
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression.
Proc. VLDB Endow., 2021
J. Parallel Distributed Comput., 2021
Proceedings of the ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
A novel memory-efficient deep learning training framework via error-bounded lossy compression.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021
2020
IEEE Trans. Parallel Distributed Syst., 2020
Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity.
ACM Trans. Design Autom. Electr. Syst., 2020
An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.
CoRR, 2020
MalFox: Camouflaged Adversarial Malware Example Generation Based on C-GANs Against Black-Box Detectors.
CoRR, 2020
Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
2019
IEEE Comput. Archit. Lett., 2019
BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets.
Proceedings of the International Conference for High Performance Computing, 2019
OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems.
Proceedings of the 46th International Symposium on Computer Architecture, 2019
PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customized Memory Cube.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019
2018
ACM Trans. Archit. Code Optim., 2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018
Proceedings of the 32nd International Conference on Supercomputing, 2018
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018
2017
Proceedings of the High Performance Computing - 32nd International Conference, 2017
Proceedings of the Workshop on Memory Centric Programming for HPC, 2017
Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels.
Proceedings of the International Conference for High Performance Computing, 2017
BVF: enabling significant on-chip power savings via bit-value-favor for throughput processors.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the International Conference on Supercomputing, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
2016
Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology.
ACM Trans. Archit. Code Optim., 2016
Proceedings of the 17th International Middleware Conference, Trento, Italy, December 12, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Proceedings of the 2016 International Conference on Supercomputing, 2016
Proceedings of the 2016 International Conference on Supercomputing, 2016
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
2015
J. Parallel Distributed Comput., 2015
Proceedings of the International Conference for High Performance Computing, 2015
Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015
2014
Extending PowerPack for Profiling and Analysis of High-Performance Accelerator-Based Systems.
Parallel Process. Lett., 2014
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil.
Int. J. High Perform. Comput. Appl., 2014
MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the 2014 International Conference on Supercomputing, 2014
ACDT: Architected Composite Data Types trading-in unfettered data access for improved execution.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014
2013
J. Supercomput., 2013
Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, 2013
A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
EDR: An energy-aware runtime load distribution system for data-intensive applications in the cloud.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013
2012
Abstract: Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Poster: Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 20th IEEE International Symposium on Modeling, 2012
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
2010
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications.
IEEE Trans. Parallel Distributed Syst., 2010
Proceedings of the 2010 International Conference on High Performance Computing, 2010
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010
2009
Int. J. High Perform. Comput. Appl., 2009