Xipeng Shen

Orcid: 0000-0003-3599-8010

Affiliations:
  • North Carolina State University, USA


According to our database1, Xipeng Shen authored at least 204 papers between 2000 and 2024.

Collaborative distances:

Timeline

2000
2005
2010
2015
2020
0
5
10
15
20
25
30
1
6
9
10
6
6
3
2
2
2
1
1
2
2
1
5
5
11
17
8
7
10
11
5
8
9
7
5
5
8
5
5
6
1
3
5
1
2
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Enabling Efficient Deep Learning on MCU With Transient Redundancy Elimination.
IEEE Trans. Computers, December, 2024

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

Data Enclave: A Data-Centric Trusted Execution Environment.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

DACO: Pursuing Ultra-low Power Consumption via DNN-Adaptive CPU-GPU CO-optimization on Mobile Devices.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Expanding the Edge: Enabling Efficient Winograd CNN Inference With Deep Reuse on Edge Device.
IEEE Trans. Knowl. Data Eng., October, 2023

Accelerating matrix-centric graph processing on GPUs through bit-level optimizations.
J. Parallel Distributed Comput., July, 2023

Automated Translation of Functional Big Data Queries to SQL.
Proc. ACM Program. Lang., April, 2023

CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression.
Proc. ACM Manag. Data, 2023

Survey: Exploiting Data Redundancy for Optimization of Deep Learning.
ACM Comput. Surv., 2023

Efficient Large Language Models Fine-Tuning On Graphs.
CoRR, 2023

Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023

Reconciling Selective Logging and Hardware Persistent Memory Transaction.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

SpecPMT: Speculative Logging for Resolving Crash Consistency Overhead of Persistent Memory.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Space-Efficient TREC for Enabling Deep Learning on Microcontrollers.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Sequential Model Optimization for Software Effort Estimation.
IEEE Trans. Software Eng., 2022

Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?
IEEE Trans. Software Eng., 2022

POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression.
IEEE Trans. Parallel Distributed Syst., 2022

Exploring Data Analytics Without Decompression on Embedded GPU Systems.
IEEE Trans. Parallel Distributed Syst., 2022

Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse.
ACM Trans. Design Autom. Electr. Syst., 2022

General Reuse-Centric CNN Accelerator.
IEEE Trans. Computers, 2022

Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory.
ACM Trans. Archit. Code Optim., 2022

Towards Seamless Management of AI Models in High-Performance Computing.
CoRR, 2022

CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework.
CoRR, 2022

DREW: Efficient Winograd CNN Inference with Deep Reuse.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Brief Industry Paper: Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card.
Proceedings of the 28th IEEE Real-Time and Embedded Technology and Applications Symposium, 2022

TREC: Transient Redundancy Elimination-based Convolution.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GCD<sup>2</sup>: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

FFCCD: fence-free crash-consistent concurrent defragmentation for persistent memory.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

IDE Augmented with Human-Learning Inspired Natural Language Programming.
Proceedings of the 44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings, 2022

Interactive NLU-Powered Ontology-Based Workflow Synthesis for FAIR Support of HPC.
Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools, 2022

Temporal Exposure Reduction Protection for Persistent Memory.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines.
Proceedings of the Software Architecture. ECSA 2022 Tracks and Workshops, 2022

Enabling Near Real-Time NLU-Driven Natural Language Programming through Dynamic Grammar Graph-Based Translation.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021
TADOC: Text analytics directly on compression.
VLDB J., 2021

How to "DODGE" Complex Software Analytics.
IEEE Trans. Software Eng., 2021

An Automatic Synthesizer of Advising Tools for High Performance Computing.
IEEE Trans. Parallel Distributed Syst., 2021

UDF to SQL translation through compositional lazy inductive synthesis.
Proc. ACM Program. Lang., 2021

Coarsening optimization for differentiable programming.
Proc. ACM Program. Lang., 2021

Reuse-centric k-means configuration.
Inf. Syst., 2021

Enabling Level-4 Autonomous Driving on a Single 1 Off-the-Shelf Card.
CoRR, 2021

Coarsening Optimization for Differentiable Programming.
CoRR, 2021

Faster SAT Solving for Software with Repeated Structures (with Case Studies on Software Test Suite Minimization).
CoRR, 2021

CoCoPIE: enabling real-time AI on off-the-shelf mobile devices via compression-compilation co-design.
Commun. ACM, 2021

Toward efficient interactions between Python and native libraries.
Proceedings of the ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021

Seeds of SEED: New Security Challenges for Persistent Memory.
Proceedings of the 2021 International Symposium on Secure and Private Execution Environment Design (SEED), 2021

Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search.
Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium, 2021

Exploring deep reuse in winograd CNN inference.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Understanding and bridging the gaps in current GNN performance optimizations.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

HPCFAIR: Enabling FAIR AI for HPC Applications.
Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing.
Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-Chips.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent Approach.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Revisit the Scalability of Deep Auto-Regressive Models for Graph Generation.
Proceedings of the International Joint Conference on Neural Networks, 2021

Simple Augmentation Goes a Long Way: ADRL for DNN Quantization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Recurrent Neural Networks Meet Context-Free Grammar: Two Birds with One Stone.
Proceedings of the IEEE International Conference on Data Mining, 2021

G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Hardware-Based Address-Centric Acceleration of Key-Value Store.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Best-Effort Lazy Evaluation for Python Software Built on APIs.
Proceedings of the 35th European Conference on Object-Oriented Programming, 2021

Deep NLP-based co-evolvement for synthesizing code analysis from natural language.
Proceedings of the CC '21: 30th ACM SIGPLAN International Conference on Compiler Construction, 2021

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Enabling Runtime SpMV Format Selection through an Overhead Conscious Method.
IEEE Trans. Parallel Distributed Syst., 2020

DIAC: An Inter-app Conflicts Detector for Open IoT Systems.
ACM Trans. Embed. Comput. Syst., 2020

Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device.
CoRR, 2020

Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices.
CoRR, 2020

CoCoPIE: Making Mobile AI Sweet As PIE -Compression-Compilation Co-Design Goes a Long Way.
CoRR, 2020

Special Issue: Graph Computing.
Concurr. Comput. Pract. Exp., 2020

HISyn: human learning-inspired natural language programming.
Proceedings of the ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020

FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

Hardware-Based Domain Virtualization for Intra-Process Isolation of Persistent Memory Objects.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

HARP: holistic analysis for refactoring Python-based analytics programs.
Proceedings of the ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June, 2020

MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Enabling Efficient Random Access to Hierarchically-Compressed Data.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

MERR: Improving Security of Persistent Memory Objects via Efficient Memory Exposure Reduction and Randomization.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Wootz: a compiler-based framework for fast CNN pruning via composability.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

In-Place Zero-Space Memory Protection for CNN.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

IA-graph based inter-app conflicts detection in open IoT systems.
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, 2019

Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse.
Proceedings of the ACM International Conference on Supercomputing, 2019

Adaptive Deep Reuse: Accelerating CNN Training on the Fly.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Streamline Density Peak Clustering for Practical Adoptions.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights.
Proc. VLDB Endow., 2018

LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine.
Neural Networks, 2018

Editorial for the Special Issue on In-Memory Computing.
J. Parallel Distributed Comput., 2018

Resolving the GPU responsiveness dilemma through program transformations.
Frontiers Comput. Sci., 2018

Hyperparameter Optimization for Effort Estimation.
CoRR, 2018

Why Software Effort Estimation Needs SBSE.
CoRR, 2018

Exploring flexible communications for streamlining DNN ensemble training pipelines.
Proceedings of the International Conference for High Performance Computing, 2018

Bridging the gap between deep learning and sparse matrix format selection.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Footprint modeling of cache associativity and granularity.
Proceedings of the International Symposium on Memory Systems, 2018

Overhead-Conscious Format Selection for SpMV-Based Applications.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data.
Proceedings of the 32nd International Conference on Supercomputing, 2018

LEEM: Lean Elastic EM for Gaussian Mixture Model via Bounds-Based Filtering.
Proceedings of the IEEE International Conference on Data Mining, 2018

Reuse-Centric K-Means Configuration.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

FALCON: A Fast Drop-In Replacement of Citation KNN for Multiple Instance Learning.
Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018

Rethinking compilers in the rise of machine learning and AI (keynote).
Proceedings of the 27th International Conference on Compiler Construction, 2018

2017
Optimizing Data Placement on GPU Memory: A Portable Approach.
IEEE Trans. Computers, 2017

GLORE: generalized loop redundancy elimination upon LER-notation.
Proc. ACM Program. Lang., 2017

Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions.
Frontiers Comput. Sci., 2017

Egeria: a framework for automatic synthesis of HPC advising tools through multi-layered natural language processing.
Proceedings of the International Conference for High Performance Computing, 2017

POSTER: An Infrastructure for HPC Knowledge Sharing and Reuse.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction.
Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2017

Versapipe: a versatile programming framework for pipelined computing on GPU.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Efficient support of position independence on non-volatile memory.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Bridging the gap between memory performance and massive parallelism: the critical role of programming systems innovations (keynote).
Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, 2017

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

POSTER: Bridging the Gap Between Deep Learning and Sparse Matrix Format Selection.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

POSTER: Cutting the Fat: Speeding Up RBM for Fast Deep Learning Through Generalized Redundancy Elimination.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations.
ACM Trans. Archit. Code Optim., 2016

Tuning for software analytics: Is it really necessary?
Inf. Softw. Technol., 2016

Data-centric combinatorial optimization of parallel code.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Coherence-Free Multiview: Enabling Reference-Discerning Data Placement on GPU.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Towards Ontology-Based Program Analysis.
Proceedings of the 30th European Conference on Object-Oriented Programming, 2016

The workshop on compiler-driven performance.
Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering, 2016

OpenCL-based erasure coding on heterogeneous architectures.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

2015
TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems.
Proc. VLDB Endow., 2015

Enabling Portable Optimizations of Data Placement on GPU.
IEEE Micro, 2015

Enhancing domain specific language implementations through ontology.
Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2015

Autotuning algorithmic choice for input sensitivity.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Free launch: optimizing GPU dynamic kernel launches through thread reuse.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Software Engagement with Sleeping CPUs.
Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015

14th compiler-driven performance workshop.
Proceedings of 25th Annual International Conference on Computer Science and Software Engineering, 2015

On-the-Fly Principled Speculation for FSM Parallelization.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Space-efficient multi-versioning for input-adaptive feedback-driven program optimizations.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

Call sequence prediction through probabilistic calling automata.
Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, 2014

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Understanding Co-run Degradations on Integrated Heterogeneous Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

Localization of concurrency bugs using shared memory access pairs.
Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, 2014

SatScore: uncovering and avoiding a principled pitfall in responsiveness measurements of app launches.
Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2014

Challenging the "embarrassingly sequential": parallelizing finite state machine-based computations through principled speculation.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
HPar: A practical parallel parser for HTML-taming HTML complexities for parallel parsing.
ACM Trans. Archit. Code Optim., 2013

An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations.
Int. J. Parallel Program., 2013

Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Software-level scheduling to exploit non-uniformly shared data cache on GPGPU.
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

Do computer programs have to be as dumb as they are?: input-centric dynamic program optimizations.
Proceedings of the VMIL@SPLASH '13: Proceedings of the 7th ACM workshop on Virtual machines and intermediate languages, 2013

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory.
Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

Simple Profile Rectifications Go a Long Way - Statistically Exploring and Alleviating the Effects of Sampling Errors for Program Optimizations.
Proceedings of the ECOOP 2013 - Object-Oriented Programming, 2013

Profmig: A framework for flexible migration of program profiles across software versions.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications.
IEEE Trans. Parallel Distributed Syst., 2012

A study towards optimal data layout for GPU computing.
Proceedings of the 2012 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '12, 2012

Exploiting inter-sequence correlations for program behavior prediction.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2012

One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation.
Proceedings of the International Conference on Supercomputing, 2012

Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions.
IEEE Trans. Parallel Distributed Syst., 2011

A step towards transparent integration of input-consciousness into dynamic program optimizations.
Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2011

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation.
Proceedings of the Languages and Compilers for Parallel Computing, 2011

On-the-fly elimination of dynamic irregularities for GPU computing.
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

An input-centric paradigm for program dynamic optimizations.
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2010

LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010

Array Regrouping on CMP with Non-uniform Cache Sharing.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping.
Proceedings of the 24th International Conference on Supercomputing, 2010

Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Exploiting statistical correlations for proactive prediction of program behaviors.
Proceedings of the CGO 2010, 2010

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?
Proceedings of the Compiler Construction, 19th International Conference, 2010

2009
Program locality analysis using reuse distance.
ACM Trans. Program. Lang. Syst., 2009

The study and handling of program inputs in the selection of garbage collectors.
ACM SIGOPS Oper. Syst. Rev., 2009

Influence of program inputs on the selection of garbage collectors.
Proceedings of the 5th International Conference on Virtual Execution Environments, 2009

A cross-input adaptive framework for GPU program optimizations.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Speculation with Little Wasting: Saving Cost in Software Speculation through Transparent Learning.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines.
Proceedings of the CGO 2009, 2009

A study on optimally co-scheduling jobs of different lengths on chip multiprocessors.
Proceedings of the 6th Conference on Computing Frontiers, 2009

2008
Scalable Implementation of Efficient Locality Approximation.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Adaptive speculation in behavior-oriented parallelization.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Adaptive Software Speculation for Enhancing the Cost-Efficiency of Behavior-Oriented Parallelization.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Exploration of the Influence of Program Inputs on CMP Co-scheduling.
Proceedings of the Euro-Par 2008, 2008

Analysis and approximation of optimal co-scheduling on chip multiprocessors.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Miss Rate Prediction Across Program Inputs and Cache Configurations.
IEEE Trans. Computers, 2007

Predicting locality phases for dynamic memory optimization.
J. Parallel Distributed Comput., 2007

Locality approximation using time.
Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2007

Software behavior oriented parallelization.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

Modeling Relations between Inputs and Dynamic Behavior for General Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

A Key-based Adaptive Transactional Memory Executor.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Analysis of input-dependent program behavior using active profiling.
Proceedings of the Workshop on Experimental Computer Science, 2007

Bridging Inputs and Program Dynamic Behavior.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Program-level adaptive memory management.
Proceedings of the 5th International Symposium on Memory Management, 2006

2005
Parallelization of Utility Programs Based on Behavior Phase Analysis.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Lightweight reference affinity analysis.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Gated memory control for memory monitoring, leak detection and garbage collection.
Proceedings of the 2005 workshop on Memory System Performance, 2005

2004
Learning multi-label scene classification.
Pattern Recognit., 2004

Multilabel machine learning and its application to semantic scene classification.
Proceedings of the Storage and Retrieval Methods and Applications for Multimedia 2004, 2004

Array regrouping and structure splitting using whole-program reference affinity.
Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation 2004, 2004

Phase-Based Miss Rate Prediction Across Program Inputs.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Adaptive Data Partition for Sorting Using Probability Distribution.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Locality phase prediction.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

2003
A Hierarchical Model of Reference Affinity.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

2001
The study of the effect of training set on statistical language modeling.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Study and auto-detection of stress based on tonal pitch range in Mandarin.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000
A CART-Based Hierarchical Stochastic Model for Prosodic Phrasing in Chinese.
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000


  Loading...