Jingling Xue

Orcid: 0000-0003-0380-3506

Affiliations:
  • University of New South Wales, School of Computer Science and Engineering, Sydney, NSW, Australia


According to our database1, Jingling Xue authored at least 279 papers between 1988 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Pearl: A Multi-Derivation Approach to Efficient CFL-Reachability Solving.
IEEE Trans. Software Eng., September, 2024

A Smart Status Based Monitoring Algorithm for the Dynamic Analysis of Memory Safety.
ACM Trans. Softw. Eng. Methodol., May, 2024

TIPS: Tracking Integer-Pointer Value Flows for C++ Member Function Pointers.
Proc. ACM Softw. Eng., 2024

Iterative-Epoch Online Cycle Elimination for Context-Free Language Reachability.
Proc. ACM Program. Lang., 2024

Boosting the Performance of Alias-Aware IFDS Analysis with CFL-Based Environment Transformers.
Proc. ACM Program. Lang., 2024

Correction-based Defense Against Adversarial Video Attacks via Discretization-Enhanced Video Compressive Sensing.
Proceedings of the 33rd USENIX Security Symposium, 2024

A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAs.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

UnsafeCop: Towards Memory Safety for Real-World Unsafe Rust Code with Practical Bounded Model Checking.
Proceedings of the Formal Methods - 26th International Symposium, 2024

A CFL-Reachability Formulation of Callsite-Sensitive Pointer Analysis with Built-In On-The-Fly Call Graph Construction.
Proceedings of the 38th European Conference on Object-Oriented Programming, 2024

Welcome from the Program Chairs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

A Context-Sensitive Pointer Analysis Framework for Rust and Its Application to Call Graph Construction.
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024

Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Automatic Target Description File Generation.
J. Comput. Sci. Technol., December, 2023

Effective Stack Wear Leveling for NVM.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2023

A Container-Usage-Pattern-Based Context Debloating Approach for Object-Sensitive Pointer Analysis.
Proc. ACM Program. Lang., October, 2023

VTensor: Using Virtual Tensors to Build a Layout-Oblivious AI Programming Framework.
J. Comput. Sci. Technol., September, 2023

IFDS-based Context Debloating for Object-Sensitive Pointer Analysis.
ACM Trans. Softw. Eng. Methodol., July, 2023

A Source-Level Instrumentation Framework for the Dynamic Analysis of Memory Safety.
IEEE Trans. Software Eng., April, 2023

Selecting Context-Sensitivity Modularly for Accelerating Object-Sensitive Pointer Analysis.
IEEE Trans. Software Eng., February, 2023

RSFuzzer: Discovering Deep SMI Handler Vulnerabilities in UEFI Firmware with Hybrid Fuzzing.
Proceedings of the 44th IEEE Symposium on Security and Privacy, 2023

Statistical Type Inference for Incomplete Programs.
Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023

SIRIUS: Harvesting Whole-Program Optimization Opportunities for DNNs.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Two Birds with One Stone: Multi-Derivation for Fast Context-Free Language Reachability Analysis.
Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023

Automatic Generation and Reuse of Precise Library Summaries for Object-Sensitive Pointer Analysis.
Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023

Merge-Replay: Efficient IFDS-Based Taint Analysis by Consolidating Equivalent Value Flows.
Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023

Hybrid Inlining: A Framework for Compositional and Context-Sensitive Static Analysis.
Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

Reducing the Memory Footprint of IFDS-Based Data-Flow Analyses using Fine-Grained Garbage Collection.
Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

Accelerating Personalized Recommendation with Cross-level Near-Memory Processing.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

AFaVS: Accurate Yet Fast Version Switching for Graph Processing Systems.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
CloudRaid: Detecting Distributed Concurrency Bugs via Log Mining and Enhancement.
IEEE Trans. Software Eng., 2022

Buddy Stacks: Protecting Return Addresses with Efficient Thread-Local Storage and Runtime Re-Randomization.
ACM Trans. Softw. Eng. Methodol., 2022

A Flexible Yet Efficient DNN Pruning Approach for Crossbar-Based Processing-in-Memory Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

ReaDy: A ReRAM-Based Processing-in-Memory Accelerator for Dynamic Graph Convolutional Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Practical Software-Based Shadow Stacks on x86-64.
ACM Trans. Archit. Code Optim., 2022

Optimizing deep neural networks on intelligent edge accelerators via flexible-rate filter pruning.
J. Syst. Archit., 2022

Qilin: A New Framework for Supporting Fine-Grained Context-Sensitivity in Java Pointer Analysis (Artifact).
Dagstuhl Artifacts Ser., 2022

Hybrid Inlining: A Compositional and Context Sensitive Static Analysis Framework.
CoRR, 2022

Finding SMM Privilege-Escalation Vulnerabilities in UEFI Firmware with Protocol-Centric Static Analysis.
Proceedings of the 43rd IEEE Symposium on Security and Privacy, 2022

A Data-Centric Accelerator for High-Performance Hypergraph Processing.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

A Dynamic Analysis Tool for Memory Safety Based on Smart Status and Source-Level Instrumentation.
Proceedings of the 44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings, 2022

ScalaGraph: A Scalable Accelerator for Massively Parallel Graph Processing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Accelerating Graph Convolutional Networks Using Crossbar-based Processing-In-Memory Architectures.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Qilin: A New Framework For Supporting Fine-Grained Context-Sensitivity in Java Pointer Analysis.
Proceedings of the 36th European Conference on Object-Oriented Programming, 2022

M3V: Multi-modal Multi-view Context Embedding for Repair Operator Prediction.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

Recovering Container Class Types in C++ Binaries.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021
Eagle: CFL-Reachability-Based Precision-Preserving Acceleration of Object-Sensitive Pointer Analysis with Partial Context Sensitivity.
ACM Trans. Softw. Eng. Methodol., 2021

Guest Editorial: Special Section on New Trends in Parallel and Distributed Computing for Human Sensible Applications.
IEEE Trans. Emerg. Top. Comput., 2021

Accelerating Object-Sensitive Pointer Analysis by Exploiting Object Containment and Reachability (Artifact).
Dagstuhl Artifacts Ser., 2021

Automatic Synthesis of Data-Flow Analyzers.
Proceedings of the Static Analysis - 28th International Symposium, 2021

Selective Context-Sensitivity for k-CFA with CFL-Reachability.
Proceedings of the Static Analysis - 28th International Symposium, 2021

Detecting TensorFlow Program Bugs in Real-World Industrial Environment.
Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, 2021

Context Debloating for Object-Sensitive Pointer Analysis.
Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, 2021

Runtime detection of memory errors with smart status.
Proceedings of the ISSTA '21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021

Accelerating Object-Sensitive Pointer Analysis by Exploiting Object Containment and Reachability.
Proceedings of the 35th European Conference on Object-Oriented Programming, 2021

GoBench: A Benchmark Suite of Real-World Go Concurrency Bugs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
Value-Flow-Based Demand-Driven Pointer Analysis for C and C++.
IEEE Trans. Software Eng., 2020

Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing.
ACM Trans. Archit. Code Optim., 2020

A Conflict-free Scheduler for High-performance Graph Processing on Multi-pipeline FPGAs.
ACM Trans. Archit. Code Optim., 2020

Referee: A Pattern-Guided Approach for Auto Design in Compiler-Based Analyzers.
Proceedings of the 27th IEEE International Conference on Software Analysis, 2020

Scaph: Scalable GPU-Accelerated Graph Processing with Value-Driven Differential Scheduling.
Proceedings of the 2020 USENIX Annual Technical Conference, 2020

A Locality-Aware Energy-Efficient Accelerator for Graph Mining Applications.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Exposing Android Event-Based Races by Selective Branch Instrumentation.
Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering, 2020

Correlating UI Contexts with Sensitive API Calls: Dynamic Semantic Extraction and Analysis.
Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering, 2020

A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Spara: An Energy-Efficient ReRAM-Based Accelerator for Sparse Graph Analytics Applications.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Every Mutation Should Be Rewarded: Boosting Fault Localization with Mutated Predicates.
Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2020

Burn after reading: a shadow stack with microsecond-level runtime rerandomization for protecting return addresses.
Proceedings of the ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June, 2020

Loop2Recursion: Compiler-Assisted Wear Leveling for Non-Volatile Memory.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

VTensor: Using Virtual Tensors to Build a Layout-oblivious AI Programming Framework.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad Memory.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Understanding and Analyzing Java Reflection.
ACM Trans. Softw. Eng. Methodol., 2019

Poker: Permutation-Based SIMD Execution of Intensive Tree Search by Path Encoding.
ACM Trans. Archit. Code Optim., 2019

SCP: Shared Cache Partitioning for High-Performance GEMM.
ACM Trans. Archit. Code Optim., 2019

Precision-preserving yet fast object-sensitive pointer analysis with partial context sensitivity.
Proc. ACM Program. Lang., 2019

LCCFS: a lightweight distributed file system for cloud computing without journaling and metadata services.
Sci. China Inf. Sci., 2019

Event trace reduction for effective bug replay of Android apps via differential GUI state analysis.
Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019

Per-Dereference Verification of Temporal Heap Safety via Adaptive Context-Sensitive Analysis.
Proceedings of the Static Analysis - 26th International Symposium, 2019

Incremental precision-preserving symbolic inference for probabilistic programs.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

WCET-aware hyper-block construction for clustered VLIW processors.
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

Performance-Boosting Sparsification of the IFDS Algorithm with Applications to Taint Analysis.
Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019

B2SFinder: Detecting Open-Source Software Reuse in COTS Software.
Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019

Detecting memory errors at runtime with source-level instrumentation.
Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019

TCD: Statically Detecting Type Confusion Errors in C++ Programs.
Proceedings of the 30th IEEE International Symposium on Software Reliability Engineering, 2019

Precise Static Happens-Before Analysis for Detecting UAF Order Violations in Android.
Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification, 2019

VFix: value-flow-guided precise program repair for null pointer dereferences.
Proceedings of the 41st International Conference on Software Engineering, 2019

A Feature-Oriented Corpus for Understanding, Evaluating and Improving Fuzz Testing.
Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, 2019

PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion.
Proceedings of the 28th International Conference on Compiler Construction, 2019

2018
Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization.
ACM Trans. Embed. Comput. Syst., 2018

Ripple: Reflection analysis for Android apps in incomplete information environments.
Softw. Pract. Exp., 2018

Parallel construction of interprocedural memory SSA form.
J. Syst. Softw., 2018

TDroid: exposing app switching attacks in Android with control flow specialization.
Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018

Understanding and detecting evolution-induced compatibility issues in Android apps.
Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018

Launch-mode-aware context-sensitive activity transition analysis.
Proceedings of the 40th International Conference on Software Engineering, 2018

Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities.
Proceedings of the 40th International Conference on Software Engineering, 2018

Live path control flow integrity.
Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, 2018

Revisiting Loop Tiling for Datacenters: Live and Let Live.
Proceedings of the 32nd International Conference on Supercomputing, 2018

May-happen-in-parallel analysis with static vector clocks.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Live Path CFI Against Control Flow Hijacking Attacks.
Proceedings of the Information Security and Privacy - 23rd Australasian Conference, 2018

Towards concurrency race debugging: an integrated approach for constraint solving and dynamic slicing.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Durable Address Translation in PCM-Based Flash Storage Systems.
IEEE Trans. Parallel Distributed Syst., 2017

An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW Processors.
ACM Trans. Embed. Comput. Syst., 2017

Fine grained, direct access file system support for storage class memory.
J. Syst. Archit., 2017

Demand-Driven Pointer Analysis with Strong Updates via Value-Flow Refinement.
CoRR, 2017

Incremental Analysis for Probabilistic Programs.
Proceedings of the Static Analysis - 24th International Symposium, 2017

Efficient and precise points-to analysis: modeling the heap by merging equivalent automata.
Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2017

Boosting the precision of virtual call integrity protection with partial pointer analysis for C++.
Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10, 2017

Reflection Analysis for Java: Uncovering More Reflective Targets Precisely.
Proceedings of the 28th IEEE International Symposium on Software Reliability Engineering, 2017

Automatic generation of fast BLAS3-GEMM: a portable compiler approach.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

Dynamic symbolic execution for polymorphism.
Proceedings of the 26th International Conference on Compiler Construction, 2017

Machine-Learning-Guided Typestate Analysis for Static Use-After-Free Detection.
Proceedings of the 33rd Annual Computer Security Applications Conference, 2017

2016
Eliminating Redundant Bounds Checks in Dynamic Buffer Overflow Detection Using Weakest Preconditions.
IEEE Trans. Reliab., 2016

Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis.
IEEE Trans. Parallel Distributed Syst., 2016

An Efficient GPU Implementation of Inclusion-Based Pointer Analysis.
IEEE Trans. Parallel Distributed Syst., 2016

Reducing Static Energy in Supercomputer Interconnection Networks Using Topology-Aware Partitioning.
IEEE Trans. Computers, 2016

A Compiler Approach for Exploiting Partial SIMD Parallelism.
ACM Trans. Archit. Code Optim., 2016

Program Tailoring: Slicing by Sequential Criteria (Artifact).
Dagstuhl Artifacts Ser., 2016

Energy Wall for Exascale Supercomputing.
Comput. Informatics, 2016

On-demand strong update analysis via value-flow refinement.
Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016

Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting.
Proceedings of the Static Analysis - 23rd International Symposium, 2016

Automated memory leak fixing on value-flow slices for C programs.
Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016

Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization.
Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, 2016

RegTT: Accelerating Tree Traversals on GPUs by Exploiting Regularities.
Proceedings of the 45th International Conference on Parallel Processing, 2016

An Energy-Efficient Implementation of LU Factorization on Heterogeneous Systems.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Program Tailoring: Slicing by Sequential Criteria.
Proceedings of the 30th European Conference on Object-Oriented Programming, 2016

Exploiting mixed SIMD parallelism by reducing data reorganization overhead.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Sparse flow-sensitive pointer analysis for multithreaded programs.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

SVF: interprocedural static value-flow analysis in LLVM.
Proceedings of the 25th International Conference on Compiler Construction, 2016

Masking Soft Errors with Static Bitwise Analysis.
Proceedings of the 23rd Asia-Pacific Software Engineering Conference, 2016

2015
Enhancement of cooperation between file systems and applications - on VFS extensions for optimized performance.
Sci. China Inf. Sci., 2015

Effective Soundness-Guided Reflection Analysis.
Proceedings of the Static Analysis - 22nd International Symposium, 2015

File system-independent block device support for storage class memory.
Proceedings of the 2015 IEEE Conference on Computer Communications Workshops, 2015

Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Region-Based May-Happen-in-Parallel Analysis for C Programs.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Contention-Aware Scheduling for Asymmetric Multicore Processors.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Performance Modeling of Multithreaded Programs for Mobile Asymmetric Chip Multiprocessors.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

2014
Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis.
IEEE Trans. Software Eng., 2014

Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisation.
Softw. Pract. Exp., 2014

OpenMC: Towards Simplifying Programming for TianHe Supercomputers.
J. Comput. Sci. Technol., 2014

Acyclic orientation graph coloring for software-managed memory allocation.
Sci. China Inf. Sci., 2014

Region-Based Selective Flow-Sensitive Pointer Analysis.
Proceedings of the Static Analysis - 21st International Symposium, 2014

WPBOUND: Enforcing Spatial Memory Safety Efficiently at Runtime with Weakest Preconditions.
Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering, 2014

Parallel Pointer Analysis with CFL-Reachability.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Self-inferencing Reflection Resolution for Java.
Proceedings of the ECOOP 2014 - Object-Oriented Programming - 28th European Conference, Uppsala, Sweden, July 28, 2014

Lifetime holes aware register allocation for clustered VLIW processors.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Accelerating Dynamic Detection of Uses of Undefined Values with Static Value-Flow Analysis.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

A collaborative divide-and-conquer K-means clustering algorithm for processing large data.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

2013
SEED: A Statically Greedy and Dynamically Adaptive Approach for Speculative Loop Execution.
IEEE Trans. Computers, 2013

Layout-oblivious compiler optimization for matrix computations.
ACM Trans. Archit. Code Optim., 2013

Acculock: accurate and efficient detection of data races.
Softw. Pract. Exp., 2013

Epipe: A low-cost fault-tolerance technique considering WCET constraints.
J. Syst. Archit., 2013

Instruction scheduling with k-successor tree for clustered VLIW processors.
Des. Autom. Embed. Syst., 2013

Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Structural Lock Correlation with Ownership Types.
Proceedings of the Programming Languages and Systems, 2013

Query-directed adaptive heap cloning for optimizing compilers.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

An Incremental Points-to Analysis with CFL-Reachability.
Proceedings of the Compiler Construction - 22nd International Conference, 2013

Scratchpad Memory aware task scheduling with minimum number of preemptions on a single processor.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

An empirical model for predicting cross-core performance interference on multicore processors.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Optimally Maximizing Iteration-Level Loop Parallelism.
IEEE Trans. Parallel Distributed Syst., 2012

Optimizing modulo scheduling to achieve reuse and concurrency for stream processors.
J. Supercomput., 2012

The Reliability Wall for Exascale Supercomputing.
IEEE Trans. Computers, 2012

Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream Processors.
ACM Trans. Archit. Code Optim., 2012

Extendable pattern-oriented optimization directives.
ACM Trans. Archit. Code Optim., 2012

Parallelizing SOR for GPGPUs using alternate loop tiling.
Parallel Comput., 2012

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs.
J. Comput. Sci. Technol., 2012

PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs.
J. Comput. Sci. Technol., 2012

WCET-aware data selection and allocation for scratchpad memory.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2012

Fast and precise points-to analysis with incremental CFL-reachability summarisation: preliminary experience.
Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, 2012

Static memory leak detection using full-sparse value-flow analysis.
Proceedings of the International Symposium on Software Testing and Analysis, 2012

What Is System Hang and How to Handle It.
Proceedings of the 23rd IEEE International Symposium on Software Reliability Engineering, 2012

A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A Highly Parallel Reuse Distance Analysis Algorithm on GPUs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs.
Proceedings of the 41st International Conference on Parallel Processing, 2012

A Type and Effect System for Determinism in Multithreaded Programs.
Proceedings of the Programming Languages and Systems, 2012

On-demand dynamic summary-based points-to analysis.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Ownership Types for Object Synchronisation.
Proceedings of the Programming Languages and Systems - 10th Asian Symposium, 2012

Layout-oblivious optimization for matrix computations.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
On Reducing Hidden Redundant Memory Accesses for DSP Applications.
IEEE Trans. Very Large Scale Integr. Syst., 2011

Leakage-Aware Modulo Scheduling for Embedded VLIW Processors.
J. Comput. Sci. Technol., 2011

Automatic Library Generation for BLAS3 on GPUs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Efficient Energy Balancing Aware Multiple Base Station Deployment for WSNs.
Proceedings of the Wireless Sensor Networks - 8th European Conference, 2011

Model-Driven Tile Size Selection for DOACROSS Loops on GPUs.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Acculock: Accurate and efficient detection of data races.
Proceedings of the CGO 2011, 2011

An efficient heuristic for instruction scheduling on clustered vliw processors.
Proceedings of the 14th International Conference on Compilers, 2011

SPAS: Scalable Path-Sensitive Pointer Analysis on Full-Sparse SSA.
Proceedings of the Programming Languages and Systems - 9th Asian Symposium, 2011

2010
Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs.
ACM Trans. Embed. Comput. Syst., 2010

Exploiting the reuse supplied by loop-dependent stream references for stream processors.
ACM Trans. Archit. Code Optim., 2010

Loop recreation for thread-level speculation on multicore processors.
Softw. Pract. Exp., 2010

Gather/scatter hardware support for accelerating Fast Fourier Transform.
J. Syst. Archit., 2010

Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010

Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Optimal WCET-aware code selection for scratchpad memory.
Proceedings of the 10th International conference on Embedded software, 2010

Reuse-aware modulo scheduling for stream processors.
Proceedings of the Design, Automation and Test in Europe, 2010

Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code.
Proceedings of the CGO 2010, 2010

Improving scratchpad allocation with demand-driven data tiling.
Proceedings of the 2010 International Conference on Compilers, 2010

2009
Compiler-directed scratchpad memory management via graph coloring.
ACM Trans. Archit. Code Optim., 2009

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization.
J. Comput. Sci. Technol., 2009

Comparability graph coloring for optimizing utilization of stream register files in stream processors.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

A Cache-Efficient Parallel Gauss-Seidel Solver with Alternating Tiling.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Exploiting Speculative TLP in Recursive Programs by Dynamic Thread Prediction.
Proceedings of the Compiler Construction, 18th International Conference, 2009

Optimal loop parallelization for maximizing iteration-level parallelism.
Proceedings of the 2009 International Conference on Compilers, 2009

Ownership Downgrading for Ownership Types.
Proceedings of the Programming Languages and Systems, 7th Asian Symposium, 2009

2008
Improving the parallelism of iterative methods by aggressive loop fusion.
J. Supercomput., 2008

Advances in high performance computing.
J. Supercomput., 2008

Minimal placement of bank selection instructions for partitioned memory architectures.
ACM Trans. Embed. Comput. Syst., 2008

Optimizing scientific application loops on stream processors.
Proceedings of the 2008 ACM SIGPLAN/SIGBED Conference on Languages, 2008

Thread-Sensitive Modulo Scheduling for Multicore Processors.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

ACS: An Addressless Configuration Support for efficient partial reconfigurations.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

Hardware Support for Efficient Sparse Matrix Vector Multiplication.
Proceedings of the 2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), 2008

A gather/scatter hardware support for efficient Fast Fourier Transform.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

Exploiting loop-dependent stream reuse for stream processors.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Data cache locking for tight timing calculations.
ACM Trans. Embed. Comput. Syst., 2007

Interprocedural side-effect analysis for incomplete object-oriented software modules.
J. Syst. Softw., 2007

Trace-based leakage energy optimisations at link time.
J. Syst. Archit., 2007

Scratchpad allocation for data aggregates in superperfect graphs.
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

Toward Automatic Data Distribution for Migrating Computations.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Loop recreation for thread-level speculation.
Proceedings of the 13th International Conference on Parallel and Distributed Systems, 2007

Validity Invariants and Effects.
Proceedings of the ECOOP 2007 - Object-Oriented Programming, 21st European Conference, Berlin, Germany, July 30, 2007

Towards Data Tiling for Whole Programs in Scratchpad Memory Allocation.
Proceedings of the Advances in Computer Systems Architecture, 2007

2006
A lifetime optimal algorithm for speculative PRE.
ACM Trans. Archit. Code Optim., 2006

A Fresh Look at Partial Redundancy Elimination as a Maximum Flow Problem.
Softwaretechnik-Trends, 2006

Partial dead code elimination on predicated code regions.
Softw. Pract. Exp., 2006

Instruction Scheduling with Release Times and Deadlines on ILP Processors.
Proceedings of the 12th IEEE Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2006), 2006

CoopStream: A Cooperative Cache Based Streaming Schedule Scheme for On-demand Media Services on Overlay Networks.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

A Fresh Look at PRE as a Maximum Flow Problem.
Proceedings of the Compiler Construction, 15th International Conference, 2006

Minimizing bank selection instructions for partitioned memory architecture.
Proceedings of the 2006 International Conference on Compilers, 2006

Trace-Based Data Cache Leakage Reduction at Link Time.
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006

2005
Cache exploitation in embedded systems.
J. Embed. Comput., 2005

Forword.
J. Comput. Sci. Technol., 2005

Aggressive Loop Fusion for Improving Locality and Parallelism.
Proceedings of the Parallel and Distributed Processing and Applications, 2005

Enabling Loop Fusion and Tiling for Cache Performance by Fixing Fusion-Preventing Data Dependences.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Fast Parallel DNA-Based Algorithms for Molecular Computation: Determining a Prime Number.
Proceedings of the Third International Conference on Information Technology and Applications (ICITA 2005), 2005

Compiler-Directed Scratchpad Memory Management.
Proceedings of the Embedded Software and Systems, Second International Conference, 2005

Completeness Analysis for Incomplete Object-Oriented Programs.
Proceedings of the Compiler Construction, 14th International Conference, 2005

Interprocedural Side-Effect Analysis and Optimisation in the Presence of Dynamic Class Loading.
Proceedings of the Computer Science 2005, 2005

Improving the Performance of GCC by Exploiting IA-64 Architectural Features.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

Memory Coloring: A Compiler Approach for Scratchpad Memory Management.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior.
IEEE Trans. Computers, 2004

A trace-based binary compilation framework for energy-aware computing.
Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, 2004

Region-Based Partial Dead Code Elimination on Predicated Code.
Proceedings of the Compiler Construction, 13th International Conference, 2004

A Comparative Study of Web Application Design Models Using the Java Technologies.
Proceedings of the Advanced Web Technologies and Applications, 2004

Strength Reduction for Loop-Invariant Types.
Proceedings of the Computer Science 2004, 2004

2003
Data cache locking for higher program predictability.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

Data Caches in Multitasking Hard Real-Time Systems.
Proceedings of the 24th IEEE Real-Time Systems Symposium (RTSS 2003), 2003

Code Tiling for Improving the Cache Performance of PDE Solvers.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Optimal and Efficient Speculation-Based Partial Redundancy Elimination.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
Time-minimal tiling when rise is larger than zero.
Parallel Comput., 2002

Eigenvectors-based parallelisation of nested loops with affine dependences.
Parallel Algorithms Appl., 2002

Space-Time Equations for Non-Unimodular Mappings.
Int. J. Comput. Math., 2002

Let's Study Whole-Program Cache Behaviour Analytically.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

2001
Communication Overhead on Distributed Memory Machines.
Parallel Distributed Comput. Pract., 2001

2000
Generating efficient tiled code for distributed memory machines.
Parallel Comput., 2000

Loop Tiling for Parallelism
The Kluwer International Series in Engineering and Computer Science 575, Kluwer, ISBN: 0-7923-7933-0, 2000

1999
Partitioning and scheduling loops on NOWs.
Comput. Commun., 1999

1998
Reuse-Driven Tiling for Improving Data Locality.
Int. J. Parallel Program., 1998

1997
On Tiling as a Loop Transformation.
Parallel Process. Lett., 1997

Unimodular Transformations of Non-Perfectly Nested Loops.
Parallel Comput., 1997

Communication-Minimal Tiling of Uniform Dependence Loops.
J. Parallel Distributed Comput., 1997

Reuse-Driven Tiling for Data Locality.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

1996
Generalising the Unimodular Approach to Restructure Imperfectly Nested Loops.
Parallel Process. Lett., 1996

Transformations of Nested Loops with Non-Convex Iteration Spaces.
Parallel Comput., 1996

Affine-by-Statement Transformations of Imperfectly Nested Loops.
Proceedings of IPPS '96, 1996

1995
Closed-form mapping conditions for the synthesis of linear processor arrays.
J. VLSI Signal Process., 1995

Constructing DO loops for non-convex iteration spaces in compiling for parallel machines.
Proceedings of IPPS '95, 1995

1994
Automating Non-Unimodular Loop Transformations for Massive Parallelism.
Parallel Comput., 1994

Avoiding Data Link and Computational Conflicts in Mapping Nested Loop Algorithms to Lower-Dimensional Processor Arrays.
Proceedings of the Proceedings 1994 International Conference on Parallel and Distributed Systems, 1994

1993
An Algorithm to Automate Non-Unimodular Transformations of Loop Nests.
Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993

A new formulation of the mapping conditions for the synthesis of linear systolic arrays.
Proceedings of the International Conference on Application-Specific Array Processors, 1993

1992
Formal synthesis of control signals for systolic arrays.
PhD thesis, 1992

The synthesis of control signals for one-dimensional systolic arrays.
Integr., 1992

On the Loading, Recovery and Access of Stationary Data in Systolic Arrays.
Proceedings of the Parallel Processing: CONPAR 92, 1992

1991
A systolic array for pyramidal algorithms.
J. VLSI Signal Process., 1991

Specifying control signals for Systolic Arrays by Uniform Recurrence Equations.
Parallel Process. Lett., 1991

Specifying control signals for one-dimensional systolic arrays by uniform recurrence equations.
Proceedings of the Algorithms and Parallel VLSI Architectures II, 1991

1988
A new data structure for representing cell hierarchy in layout design.
Comput. Graph., 1988


  Loading...