Venkatram Vishwanath

Orcid: 0000-0001-7248-6116

According to our database1, Venkatram Vishwanath authored at least 138 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Thorough Characterization and Analysis of Large Transformer Model Training At-Scale.
Proc. ACM Meas. Anal. Comput. Syst., 2024

Adding topology and memory awareness in data aggregation algorithms.
Future Gener. Comput. Syst., 2024

Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling.
CoRR, 2024

Mesh-based Super-Resolution of Fluid Flows with Multiscale Graph Neural Networks.
CoRR, 2024

Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

High Performance Binding Affinity Prediction with a Transformer-Based Surrogate Model.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

2023
A survey of techniques for optimizing transformer inference.
J. Syst. Archit., November, 2023

GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics.
Int. J. High Perform. Comput. Appl., November, 2023

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023

A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators.
CoRR, 2023

Parallel Multi-Objective Hyperparameter Optimization with Uniform Normalization and Bounded Objectives.
CoRR, 2023

Neural Architecture Search Benchmarks: Insights and Survey.
IEEE Access, 2023

Differentiable Neural Architecture, Mixed Precision and Accelerator Co-Search.
IEEE Access, 2023

Exploring the Use of Dataflow Architectures for Graph Neural Network Workloads.
Proceedings of the High Performance Computing, 2023

Scalable Lead Prediction with Transformers using HPC resources.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Demonstration of Portable Performance of Scientific Machine Learning on High Performance Computing Systems.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Characterizing the Performance of Triangle Counting on Graphcore's IPU Architecture.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

Asynchronous Decentralized Bayesian Optimization for Large Scale Hyperparameter Optimization.
Proceedings of the 19th IEEE International Conference on e-Science, 2023

2022
PythonFOAM: In-situ data analyses with OpenFOAM and Python.
J. Comput. Sci., 2022

Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action.
Int. J. High Perform. Comput. Appl., 2022

Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific Applications.
CoRR, 2022

Asynchronous Distributed Bayesian Optimization at HPC Scale.
CoRR, 2022

Neural Architecture Search for Transformers: A Survey.
IEEE Access, 2022

AI Benchmarking for Science: Efforts from the MLCommons Science Working Group.
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022


Efficient Design Space Exploration for Sparse Mixed Precision Neural Architectures.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

HDF5 Cache VOL: Efficient and Scalable Parallel I/O through Caching Data on Node-local Storage.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

Toward an In-Depth Analysis of Multifidelity High Performance Computing Systems.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

Stimulus: Accelerate Data Management for Scientific AI applications in HPC.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021
Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture.
Comput. Sci. Eng., 2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
CoRR, 2021

AgEBO-tabular: joint neural architecture and hyperparameter search with autotuned data-parallel training for tabular data.
Proceedings of the International Conference for High Performance Computing, 2021

Stream-AI-MD: streaming AI-driven adaptive molecular simulations for heterogeneous computing platforms.
Proceedings of the PASC '21: Platform for Advanced Scientific Computing Conference, 2021


DLIO: A Data-Centric Benchmark for Scientific Deep Learning Applications.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
A machine learning workflow for molecular analysis: application to melting points.
Mach. Learn. Sci. Technol., 2020

ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems.
J. Comput. Sci. Technol., 2020

A terminology for in situ visualization and analysis systems.
Int. J. High Perform. Comput. Appl., 2020

AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data.
CoRR, 2020

SeeSAw: Optimizing Performance of In-Situ Analytics Applications under Power Constraints.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019
Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows.
CoRR, 2019

Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping.
CoRR, 2019

A Benchmarking Study to Evaluate Apache Spark on Large-Scale Supercomputers.
CoRR, 2019

MELA: A Visual Analytics Tool for Studying Multifidelity HPC System Logs.
Proceedings of the 3rd IEEE/ACM Industry/University Joint International Workshop on Data-center Automation, 2019

Balsam: Near Real-Time Experimental Data Analysis on Supercomputers.
Proceedings of the 1st IEEE/ACM Annual Workshop on Large-scale Experiment-in-the-Loop Computing, 2019

Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping.
Proceedings of the Third IEEE/ACM Workshop on Deep Learning on Supercomputers, 2019

Scalable reinforcement-learning-based neural architecture search for cancer deep learning research.
Proceedings of the International Conference for High Performance Computing, 2019

2018
libIS: a lightweight library for flexible in transit visualization.
Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, 2018

Topology-aware space-shared co-analysis of large-scale molecular dynamics simulations.
Proceedings of the International Conference for High Performance Computing, 2018

Benchmarking Machine Learning Methods for Performance Modeling of Scientific Applications.
Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

Optimizing Data Aggregation by Leveraging the Deep Memory Hierarchy on Large-scale Systems.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Toward Scalable and Asynchronous Object-Centric Data Management for HPC.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
Data movement optimizations for independent MPI I/O on the Blue Gene/Q.
Parallel Comput., 2017

Hierarchical Read-Write Optimizations for Scientific Applications with Multi-variable Structured Datasets.
Int. J. Parallel Program., 2017

HACC: extreme scaling and performance across diverse architectures.
Commun. ACM, 2017

A distributed graph approach for pre-processing linked RDF data using supercomputers.
Proceedings of The International Workshop on Semantic Big Data, 2017

PoLiMEr: An Energy Monitoring and Power Limiting Interface for HPC Applications.
Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, 2017

Scalable In situ Analysis of Molecular Dynamics Simulations.
Proceedings of the In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visualization, 2017

A Visual Analytics System for Optimizing Communications in Massively Parallel Applications.
Proceedings of the 12th IEEE Conference on Visual Analytics Science and Technology, 2017

TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Analytical Performance Modeling and Validation of Intel's Xeon Phi Architecture.
Proceedings of the Computing Frontiers Conference, 2017

2016
Application power profiling on IBM Blue Gene/Q.
Parallel Comput., 2016

Improving sparse data movement performance using multiple paths on the Blue Gene/Q supercomputer.
Parallel Comput., 2016

Workflow performance improvement using model-based scheduling over multiple clusters and clouds.
Future Gener. Comput. Syst., 2016

<i>In Situ</i> Methods, Infrastructures, and Applications on High Performance Computing Platforms.
Comput. Graph. Forum, 2016

Early Investigations into Using a Remote RAM Pool with the vl3 Visualization Framework.
Proceedings of the Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, 2016

A data driven scheduling approach for power management on HPC systems.
Proceedings of the International Conference for High Performance Computing, 2016

Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

Optimal execution of co-analysis for large-scale molecular dynamics simulations.
Proceedings of the International Conference for High Performance Computing, 2016

Performance analysis, design considerations, and applications of extreme-scale <i>in situ</i> infrastructures.
Proceedings of the International Conference for High Performance Computing, 2016

Parallel distributed, GPU-accelerated, advanced lighting calculations for large-scale volume visualization.
Proceedings of the 6th IEEE Symposium on Large Data Analysis and Visualization, 2016

Coupling LAMMPS and the vl3 Framework for Co-Visualization of Atomistic Simulations.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
Cluster-to-cluster data transfer with data compression over wide-area networks.
J. Parallel Distributed Comput., 2015

Optimal scheduling of in-situ analysis for large-scale scientific simulations.
Proceedings of the International Conference for High Performance Computing, 2015

Route-aware independent MPI I/O on the blue gene/Q.
Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, 2015

Large-scale co-visualization for LAMMPS using vl3.
Proceedings of the 5th IEEE Symposium on Large Data Analysis and Visualization, 2015

Streaming ultra high resolution images to large tiled display at nearly interactive frame rate with vl3.
Proceedings of the 5th IEEE Symposium on Large Data Analysis and Visualization, 2015

Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Profiling transport performance for big data transfer over dedicated channels.
Proceedings of the International Conference on Computing, Networking and Communications, 2015

Improving Communication Throughput by Multipath Load Balancing on Blue Gene/Q.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

Large-Scale Parallel Visualization of Particle-Based Simulations using Point Sprites and Level-Of-Detail.
Proceedings of the 15th Eurographics Symposium on Parallel Graphics and Visualization, 2015

Comparison of Vendor Supplied Environmental Data Collection Mechanisms.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Multipath Load Balancing for M × N Communication Patterns on the Blue Gene/Q Supercomputer Interconnection Network.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

TECA: Petascale Pattern Recognition for Climate Science.
Proceedings of the Computer Analysis of Images and Patterns, 2015

2014
Large-Scale Simulations of Sky Surveys.
Comput. Sci. Eng., 2014

DIRAQ: scalable in situ data- and resource-aware indexing for optimized query performance.
Clust. Comput., 2014

Fast Multiresolution Reads of Massive Simulation Datasets.
Proceedings of the Supercomputing - 29th International Conference, 2014

Efficient I/O and Storage of Adaptive-Resolution Data.
Proceedings of the International Conference for High Performance Computing, 2014

Distributed multipath routing algorithm for data center networks.
Proceedings of the 2014 International Workshop on Data Intensive Scalable Computing Systems, 2014

Scalable Parallel I/O on a Blue Gene/Q Supercomputer Using Compression, Topology-Aware Data Aggregation, and Subfiling.
Proceedings of the 22nd Euromicro International Conference on Parallel, 2014

Improving Data Movement Performance for Sparse Data Patterns on the Blue Gene/Q Supercomputer.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Improving Multisite Workflow Performance Using Model-Based Scheduling.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Performance Modeling of vl3 Volume Rendering on GPU-Based Clusters.
Proceedings of the 14th Eurographics Symposium on Parallel Graphics and Visualization, 2014

SKOPE: a framework for modeling and exploring workload behavior.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

2013
Multi-domain job coscheduling for leadership computing systems.
J. Supercomput., 2013

On-demand unstructured mesh translation for reducing memory pressure during in situ analysis.
Proceedings of the 8th International Workshop on Ultrascale Visualization, 2013

Characterization and modeling of PIDX parallel I/O for performance optimization.
Proceedings of the International Conference for High Performance Computing, 2013

Characterization and Understanding Machine-Specific Interconnects.
Proceedings of the Parallel Computing Technologies - 12th International Conference, 2013

Efficient parallel volume rendering of large-scale adaptive mesh refinement data.
Proceedings of the IEEE Symposium on Large-Scale Data Analysis and Visualization, 2013

Measuring Power Consumption on IBM Blue Gene/Q.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Early Experience on the Blue Gene/Q Supercomputing System.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Proactive Support for Large-Scale Data Exploration.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Scalable in situ scientific data encoding for analytical query processing.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

A Generic High-Performance Method for Deinterleaving Scientific Data.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Application power profiling on IBM Blue Gene/Q.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Model-driven multisite workflow scheduling.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Toward optimizing disk-to-disk transfer on 100G networks.
Proceedings of the IEEE International Conference on Advanced Networks and Telecommunications Systems, 2013

2012
Accelerating Data Movement Leveraging End-System and Network Parallelism.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Dataflow-driven GPU performance projection for multi-kernel transformations.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Efficient data restructuring and aggregation for I/O acceleration in PIDX.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Poster: Evaluating Communication Performance in BlueGene/Q and Cray XE6 Supercomputers.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Evaluating Communication Performance in BlueGene/Q and Cray XE6 Supercomputers.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

ALCF MPI Benchmarks: Understanding Machine-Specific Communication Behavior.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Evaluating Power-Monitoring Capabilities on IBM Blue Gene/P and Blue Gene/Q.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

2011
Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Electronic poster: co-visualization of full data and in situ data extracts from unstructured grid cfd at 160k cores.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

GROPHECY: GPU performance projection from CPU code skeletons.
Proceedings of the Conference on High Performance Computing Networking, 2011

Modeling early galaxies using radiation hydrodynamics.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Toward simulation-time data analysis and I/O acceleration on leadership-class systems.
Proceedings of the IEEE Symposium on Large Data Analysis and Visualization, 2011

Exploring large data over wide area networks.
Proceedings of the IEEE Symposium on Large Data Analysis and Visualization, 2011

Job Coscheduling on Coupled High-End Computing Systems.
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010
Accelerating I/O Forwarding in IBM Blue Gene/P Systems.
Proceedings of the Conference on High Performance Computing Networking, 2010

Multi-application inter-tile synchronization on ultra-high-resolution display walls.
Proceedings of the First Annual ACM SIGMM Conference on Multimedia Systems, 2010

2009
Accelerating tropical cyclone analysis using LambdaRAM, a distributed data cache over wide-area ultra-fast networks.
Future Gener. Comput. Syst., 2009

The OptIPortal, a scalable visualization, storage, and computing interface device for the OptiPuter.
Future Gener. Comput. Syst., 2009

2008
Specification and Verification of LambdaRAM: A Wide-area Distributed Cache for High Performance Computing.
Proceedings of the 6th ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2008), 2008

The Rails Toolkit - Enabling End-System Topology-Aware High End Computing.
Proceedings of the Fourth International Conference on e-Science, 2008

2006
The global lambda visualization facility: An international ultra-high-definition wide-area visualization collaboratory.
Future Gener. Comput. Syst., 2006

The first functional demonstration of optical virtual concatenation as a technique for achieving Terabit networking.
Future Gener. Comput. Syst., 2006

AR-PIN/PDC: Flexible Advance Reservation of Intradomain and Interdomain Lightpaths.
Proceedings of the Global Telecommunications Conference, 2006. GLOBECOM '06, San Francisco, CA, USA, 27 November, 2006

LambdaBridge: A Scalable Architecture for Future Generation Terabit Applications.
Proceedings of the 3rd International Conference on Broadband Communications, 2006

2004
Vol-a-Tile - A Tool for Interactive Exploration of Large Volumetric Data on Scalable Tiled Displays.
Proceedings of the 15th IEEE Visualization Conference, 2004

JuxtaView - a tool for interactive visualization of large imagery on scalable tiled displays.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004


  Loading...