Jingwen Leng

Orcid: 0000-0002-5660-5493

Affiliations:
  • Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China
  • IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
  • University of Texas at Austin, Department of Electrical and Computer Engineering, TX, USA (PhD 2016)


According to our database1, Jingwen Leng authored at least 91 papers between 2013 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
BAFT: bubble-aware fault-tolerant framework for distributed DNN training with hybrid parallelism.
Frontiers Comput. Sci., January, 2025

2024
Accelerating Sparse DNNs Based on Tiled GEMM.
IEEE Trans. Computers, May, 2024

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization.
CoRR, 2024

Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture.
CoRR, 2024

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving.
CoRR, 2024

Towards Fast Setup and High Throughput of GPU Serverless Computing.
CoRR, 2024

A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous Things.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-To-Coarse Attention.
Proceedings of the IEEE International Conference on Acoustics, 2024

JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Fractal: Joint Multi-Level Sparse Pattern Tuning of Accuracy and Performance for DNN Pruning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Improving Cluster Utilization Through Adaptive Resource Management for Deep Neural Network and CPU Jobs Colocation.
IEEE Trans. Computers, December, 2023

Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design.
CoRR, 2023

DFlow: Efficient Dataflow-based Invocation Workflow Execution for Function-as-a-Service.
CoRR, 2023

ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

FIRST: Exploiting the Multi-Dimensional Attributes of Functions for Power-Aware Serverless Computing.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization.
Proceedings of the 37th International Conference on Supercomputing, 2023

Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Not All Resources are Visible: Exploiting Fragmented Shadow Resources in Shared-State Scheduler Architecture.
Proceedings of the 2023 ACM Symposium on Cloud Computing, SoCC 2023, 2023

DistSim: A performance model of large-scale hybrid distributed DNN training.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

uGrapher: High-Performance Graph Operator Computation via Unified Abstraction for Graph Neural Networks.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization.
CoRR, 2022

Towards Reliable AI Applications via Algorithm-Based Fault Tolerance on NVDLA.
Proceedings of the 18th International Conference on Mobility, Sensing and Networking, 2022

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Braum: Analyzing and Protecting Autonomous Machine Software Stack.
Proceedings of the IEEE 33rd International Symposium on Software Reliability Engineering, 2022

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

SALO: an efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

EMS: efficient memory subsystem synthesis for spatial accelerators.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

Transkimmer: Transformer Learns to Layer-wise Skim.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Block-Skim: Efficient Question Answering for Transformer.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Erratum to "Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs".
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

System-level Early-stage Modeling and Evaluation of IVR-assisted Processor Power Delivery System.
ACM Trans. Archit. Code Optim., 2021

ZIPPER: Exploiting Tile- and Operator-level Parallelism for General and Scalable Graph Neural Network Acceleration.
CoRR, 2021

Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction.
Proceedings of the International Conference for High Performance Computing, 2021

Dual-side Sparse Tensor Core.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

AlphaR: Learning-Powered Resource Management for Irregular, Dynamic Microservice Graph.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020
Voltage-Stacked Power Delivery Systems: Reliability, Efficiency, and Power Management.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Predicting and reining in application-level slowdown on spatial multitasking GPUs.
J. Parallel Distributed Comput., 2020

Probabilistic robust regression with adaptive weights - a case study on face recognition.
Frontiers Comput. Sci., 2020

Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins.
CoRR, 2020

Towards QoS-Aware and Resource-Efficient GPU Microservices Based on Spatial Multitasking GPUs In Datacenters.
CoRR, 2020

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator.
CCF Trans. High Perform. Comput., 2020

Architectural Implications of Graph Neural Networks.
IEEE Comput. Archit. Lett., 2020

Accelerating sparse DNN models without hardware-support via tile-wise sparsity.
Proceedings of the International Conference for High Performance Computing, 2020

Ptolemy: Architecture Support for Robust Deep Learning.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2020

Sturgeon: Preference-aware Co-location for Improving Utilization of Power Constrained Computers.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

OVERSEE: Outsourcing Verification to Enable Resource Sharing in Edge Environment.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

Asymmetric Resilience: Exploiting Task-Level Idempotency for Transient Error Recovery in Accelerator-Based Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Low-Latency Proactive Continuous Vision.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
DR Refresh: Releasing DRAM Potential by Enabling Read Accesses Under Refresh.
IEEE Trans. Computers, 2019

Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric Memory.
ACM Trans. Archit. Code Optim., 2019

Asymmetric Resilience for Accelerator-Rich Systems.
IEEE Comput. Archit. Lett., 2019

SVSoC: Speculative Vision Systems-on-a-Chip.
IEEE Comput. Archit. Lett., 2019

Characterizing Perception Module Performance and Robustness in Production-Scale Autonomous Driving System.
Proceedings of the Network and Parallel Computing, 2019

Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Modern Hardware Margins: CPUs, GPUs, FPGAs Recent System-Level Studies.
Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019

Avalon: towards QoS awareness and improved utilization through multi-resource management in datacenters.
Proceedings of the ACM International Conference on Supercomputing, 2019

Ebird: Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019

Adversarial Defense Through Network Profiling Based Path Extraction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Voltage-Stacked GPUs: A Control Theory Driven Cross-Layer Solution for Practical Voltage Stacking in GPUs.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

DR DRAM: Accelerating Memory-Read-Intensive Applications.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

Efficient and reliable power delivery in voltage-stacked manycore system with hybrid charge-recycling regulators.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Ivory: Early-Stage Design Space Exploration Tool for Integrated Voltage Regulators.
Proceedings of the 54th Annual Design Automation Conference, 2017

2015
Adaptive guardband scheduling to improve system-level efficiency of the POWER7+.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Safe limits on voltage reduction efficiency in GPUs: a direct measurement approach.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

GPU voltage noise: Characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014
Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing.
IEEE Comput. Archit. Lett., 2014

GPUVolt: modeling and characterizing voltage noise in GPU architectures.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

2013
A locality-aware memory hierarchy for energy-efficient GPU architectures.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

GPUWattch: enabling energy optimizations in GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013


  Loading...