Gordon Euhyun Moon

Proceedings of the Advances in Knowledge Discovery and Data Mining, 2024

Exploiting Tensor Cores in Sparse Matrix-Multivector Multiplication via Block-Sparsity-Aware Clustering.

[BibT_eX]

[DOI]

Eunji Lee

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Accelerated Block-Sparsity-Aware Matrix Reordering for Leveraging Tensor Cores in Sparse Matrix-Multivector Multiplication.

[BibT_eX]

[DOI]

Eunji Lee

Proceedings of the Euro-Par 2024: Parallel Processing, 2024

ML-Based Dynamic Operator-Level Query Mapping for Stream Processing Systems in Heterogeneous Computing Environments.

[BibT_eX]

[DOI]

Sejeong Oh

Sungyong Park

Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023

SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling.

[BibT_eX]

[DOI]

Bokyeong Yoon

CoRR, 2023

Chronica: A Data-Imbalance-Aware Scheduler for Distributed Deep Learning.

[BibT_eX]

[DOI]

Sanha Maeng

Sivasankaran Rajamanickam

Sungyong Park

Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication.

[BibT_eX]

[DOI]

Tushar Krishna

IEEE Trans. Parallel Distributed Syst., 2022

Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences.

[BibT_eX]

[DOI]

Sivasankaran Rajamanickam

Eric C. Cyr

CoRR, 2022

2021

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats.

[BibT_eX]

[DOI]

Tushar Krishna

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020

ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization.

[BibT_eX]

[DOI]

J. Austin Ellis

Srinivasan Parthasarathy

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019

PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization.

[BibT_eX]

[DOI]

Srinivasan Parthasarathy

CoRR, 2019

Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings.

[BibT_eX]

[DOI]

Denis Newman-Griffis

Jinsung Kim

Eric Fosler-Lussier

Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019

2018

Parallel Latent Dirichlet Allocation on GPUs.

[BibT_eX]

[DOI]

Israt Nisa

Bortik Bandyopadhyay

Srinivasan Parthasarathy

Proceedings of the Computational Science - ICCS 2018, 2018

2017

Parallel LDA with Over-Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Conference on High Performance Computing Workshops, 2017

2016

A Large-Scale Study in Predictability of Daily Activities and Places.

[BibT_eX]

[DOI]