Changkyu Kim

Orcid: 0000-0002-0283-8371

According to our database1, Changkyu Kim authored at least 45 papers between 2002 and 2022.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2022
Supporting Massive DLRM Inference through Software Defined Memory.
Proceedings of the 42nd IEEE International Conference on Distributed Computing Systems, 2022

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

2021
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale.
IEEE Micro, 2021

Supporting Massive DLRM Inference Through Software Defined Memory.
CoRR, 2021

First-Generation Inference Accelerator Deployment at Facebook.
CoRR, 2021

2020
Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems.
CoRR, 2020

2018
WSMeter: A Performance Evaluation Methodology for Google's Production Warehouse-Scale Computers.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2016
Using SSDs to scale up Google Fusion Tables, a database-in-the-cloud.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

2015
Can traditional programming bridge the ninja performance gap for parallel computing applications?
Commun. ACM, 2015

2014
Author retrospective for a NUCA substrate for flexible CMP cache sharing.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Joint interference and user association optimization in cellular wireless networks.
Proceedings of the 48th Asilomar Conference on Signals, Systems and Computers, 2014

2013
Joint Interference and User Association Optimization in Cellular Wireless Networks
CoRR, 2013

Locality-aware task management for unstructured parallelism: a quantitative limit study.
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Opportunistic third-party backhaul for cellular wireless networks.
Proceedings of the 2013 Asilomar Conference on Signals, 2013

2012
DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing.
IEEE Micro, 2012

CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

GPP-Grep: High-Speed Regular Expression Processing Engine on General Purpose Processors.
Proceedings of the Research in Attacks, Intrusions, and Defenses, 2012

Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011
Designing fast architecture-sensitive tree search on modern multicore/many-core processors.
ACM Trans. Database Syst., 2011

PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors.
Proc. VLDB Endow., 2011

Fast Updates on Read-Optimized Databases Using Multi-Core CPUs.
Proc. VLDB Endow., 2011

Moguls: a model to explore the memory hierarchy for bandwidth improvements.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

2010
Performance and Energy Implications of Many-Core Caches for Throughput Computing.
IEEE Micro, 2010

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

FAST: fast architecture sensitive tree search on modern CPUs and GPUs.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs.
Proceedings of the Conference on High Performance Computing Networking, 2010

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

2009
Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs.
Proc. VLDB Endow., 2009

ClearPath: highly parallel collision avoidance for multi-agent simulation.
Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2009

Interactive Modeling, Simulation and Control of Large-Scale Crowds and Traffic.
Proceedings of the Motion in Games, Second International Workshop, 2009

Efficient shared cache management through sharing-aware replacement and streaming-aware insertion policy.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008
Multitasking workload scheduling on flexible core chip multiprocessors.
SIGARCH Comput. Archit. News, 2008

Second Life and the New Generation of Virtual Worlds.
Computer, 2008

Atomic Vector Operations on Chip Multiprocessors.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

2007
A NUCA Substrate for Flexible CMP Cache Sharing.
IEEE Trans. Parallel Distributed Syst., 2007

On-Chip Interconnection Networks of the TRIPS Chip.
IEEE Micro, 2007

Composable Lightweight Processors.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

2006
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Implementation and Evaluation of On-Chip Network Architectures.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

2004
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP.
ACM Trans. Archit. Code Optim., 2004

2003
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture.
IEEE Micro, 2003

Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches.
IEEE Micro, 2003

2002
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002


  Loading...