Gordon Euhyun Moon

Orcid: 0000-0003-4992-6181

According to our database1, Gordon Euhyun Moon authored at least 16 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Exploring Attention Sparsity to Accelerate Transformer Training on GPUs.
IEEE Access, 2024

Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2024

Exploiting Tensor Cores in Sparse Matrix-Multivector Multiplication via Block-Sparsity-Aware Clustering.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Accelerated Block-Sparsity-Aware Matrix Reordering for Leveraging Tensor Cores in Sparse Matrix-Multivector Multiplication.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

ML-Based Dynamic Operator-Level Query Mapping for Stream Processing Systems in Heterogeneous Computing Environments.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling.
CoRR, 2023

Chronica: A Data-Imbalance-Aware Scheduler for Distributed Deep Learning.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication.
IEEE Trans. Parallel Distributed Syst., 2022

Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences.
CoRR, 2022

2021
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020
ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019
PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization.
CoRR, 2019

Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings.
Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019

2018
Parallel Latent Dirichlet Allocation on GPUs.
Proceedings of the Computational Science - ICCS 2018, 2018

2017
Parallel LDA with Over-Decomposition.
Proceedings of the 24th IEEE International Conference on High Performance Computing Workshops, 2017

2016
A Large-Scale Study in Predictability of Daily Activities and Places.
Proceedings of the 8th EAI International Conference on Mobile Computing, 2016


  Loading...