Karthikeyan Vaidyanathan

Orcid: 0000-0001-8125-1183

According to our database1, Karthikeyan Vaidyanathan authored at least 51 papers between 2004 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Random-Access Neural Compression of Material Textures.
ACM Trans. Graph., August, 2023

Temporally Stable Real-Time Joint Neural Denoising and Supersampling.
Proc. ACM Comput. Graph. Interact. Tech., 2022

Ray Tracing Lossy Compressed Grid Primitives.
Proceedings of the 42nd Annual Conference of the European Association for Computer Graphics, 2021

A reduced-precision network for image reconstruction.
ACM Trans. Graph., 2020

Flexible Ray Traversal with an Extended Programming Model.
Proceedings of the SIGGRAPH Asia 2019 Technical Briefs, 2019

Wide BVH Traversal with a Short Stack.
Proceedings of the High-Performance Graphics 2019, 2019

The History of Software Architecture - In the Eye of the Practitioner.
CoRR, 2018

On Scale-out Deep Learning Training for Cloud and HPC.
CoRR, 2018

Coarse pixel shading with temporal supersampling.
Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.
Proceedings of the 6th International Conference on Learning Representations, 2018

Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.
Int. J. High Perform. Comput. Appl., 2016

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent.
CoRR, 2016

Optimizing Wilson-Dirac Operator and Linear Solvers for Intel® KNL.
Proceedings of the High Performance Computing, 2016

Watertight ray traversal with reduced precision.
Proceedings of High Performance Graphics, 2016

Bandwidth-efficient BVH layout for incremental hardware traversal.
Proceedings of High Performance Graphics, 2016

Layered Light Field Reconstruction for Defocus Blur.
ACM Trans. Graph., 2015

Improving concurrency and asynchrony in multithreaded MPI applications using software offloading.
Proceedings of the International Conference for High Performance Computing, 2015

Layered Reconstruction for Defocus and Motion Blur.
Comput. Graph. Forum, 2014

Multi-layer alpha blending.
Proceedings of the Symposium on Interactive 3D Graphics and Games, 2014

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices.
Proceedings of the International Conference for High Performance Computing, 2014

Lattice QCD with Domain Decomposition on Intel® Xeon Phi Co-Processors.
Proceedings of the International Conference for High Performance Computing, 2014

Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers.
Proceedings of the International Conference for High Performance Computing, 2014

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Coarse Pixel Shading.
Proceedings of the High-Performance Graphics 2014, Lyon, France, 2014. Proceedings, 2014

Lattice QCD on Intel® Xeon PhiTM Coprocessors.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors.
Proceedings of the International Conference for High Performance Computing, 2013

Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Improving the Performance of Dynamical Simulations Via Multiple Right-Hand Sides.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Exascale Computing & Beyond: Meeting the Challenges.
Proceedings of the Transition of HPC Towards Exascale Computing, 2012

Adaptive Image Space Shading for Motion and Defocus Blur.
Proceedings of the EUROGRAPHICS Conference on High Performance Graphics 2012, 2012

High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach.
Proceedings of the Conference on High Performance Computing Networking, 2011

Optimized Distributed Data Sharing Substrate in Multi-core Commodity Clusters: A Comprehensive Study with Applications.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

Advanced RDMA-Based Admission Control for Modern Data-Centers.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

Benefits of I/O Acceleration Technology (I/OAT) in Clusters.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Efficient asynchronous memory copy operations on multi-core systems and I/OAT.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Designing next generation data-centers with advanced communication protocols and systems services.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

NemC: A Network Emulator for Cluster-of-Clusters.
Proceedings of the 15th International Conference On Computer Communications and Networks, 2006

DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects.
Proceedings of the High Performance Computing, 2006

Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

Communication and Memory Optimal Parallel Data Cube Construction.
IEEE Trans. Parallel Distributed Syst., 2005

On the provision of prioritization and soft qos in dynamically reconfigurable shared data-centers over infiniband.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Supporting iWARP Compatibility and Features for Regular Network Adapters.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Architecture for caching responses with multiple dynamic dependencies in multi-tier data-centers over InfiniBand.
Proceedings of the 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), 2005

Sockets Direct Protocol over InfiniBand in clusters: is it beneficial?
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Microsystem controller for sensor network control and data correction.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

Using Tiling to Scale Parallel Data Cube Construction.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

DiST: A Scalable, Efficient P2P Lookup Protocol.
Proceedings of the Agents and Peer-to-Peer Computing, Third International Workshop, 2004
