Tan Nguyen

Orcid: 0000-0003-3748-403X

Affiliations:
  • Lawrence Berkeley National Laboratory, Berkeley, CA, USA
  • University of California San Diego, La Jolla, CA, USA (PhD 2014)


According to our database1, Tan Nguyen authored at least 19 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Devastator: A Scalable Parallel Discrete Event Simulation Framework for Modern C++.
Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2024

2023
Benefits of Optimistic Parallel Discrete Event Simulation for Network-on-Chip Simulation.
Proceedings of the 27th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications, 2023

2022
FPGA-based HPC accelerators: An evaluation on performance and energy efficiency.
Concurr. Comput. Pract. Exp., 2022

2021
Architectural Requirements for Deep Learning Workloads in HPC Environments.
Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Facilitating CoDesign with Automatic Code Similarity Learning.
Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021

Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs.
Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021

2020
The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

2019
AMReX: a framework for block-structured adaptive mesh refinement.
J. Open Source Softw., 2019

Asynchronous AMR on Multi-GPUs.
Proceedings of the High Performance Computing, 2019

2018
Phase asynchronous AMR execution for productive and performant astrophysical flows.
Proceedings of the International Conference for High Performance Computing, 2018

2017
Automatic translation of MPI source into a latency-tolerant, data-driven form.
J. Parallel Distributed Comput., 2017

Nonintrusive AMR Asynchrony for Communication Optimization.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework.
SIAM J. Sci. Comput., 2016

BoxLib with Tiling: An AMR Software Framework.
CoRR, 2016

TiDA: High-Level Programming Abstractions for Data Locality Management.
Proceedings of the High Performance Computing - 31st International Conference, 2016

Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement.
Proceedings of the International Conference for High Performance Computing, 2016

2015
LU Factorization: Towards Hiding Communication Overheads with a Lookahead-Free Algorithm.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2013
A software-based dynamic-warp scheduling approach for load-balancing the Viola-Jones face detection algorithm on GPUs.
J. Parallel Distributed Comput., 2013

2012
Bamboo: translating MPI applications to a latency-tolerant, data-driven form.
Proceedings of the SC Conference on High Performance Computing Networking, 2012


  Loading...