Mahmut T. Kandemir

Orcid: 0000-0002-9940-9951

  • Penn State, University Park, USA

According to our database1, Mahmut T. Kandemir authored at least 781 papers between 1997 and 2024.

Collaborative distances:


IEEE Fellow

IEEE Fellow 2016, "For contributions to compiler support for performance and energy optimization of computer architectures".




In proceedings 
PhD thesis 


Online presence:



An Efficient Edge-Cloud Partitioning of Random Forests for Distributed Sensor Networks.
IEEE Embed. Syst. Lett., March, 2024

Thorough Characterization and Analysis of Large Transformer Model Training At-Scale.
Proc. ACM Meas. Anal. Comput. Syst., 2024

Parallelization Strategies for DLRM Embedding Bag Operator on AMD CPUs.
IEEE Micro, 2024

GPU Cluster Scheduling for Network-Sensitive Deep Learning.
CoRR, 2024

Speculative Monte-Carlo Tree Search.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SpotVerse: Optimizing Bioinformatics Workflows with Multi-Region Spot Instances in Galaxy and Beyond.
Proceedings of the 25th International Middleware Conference, 2024

Towards SLO-Compliant and Cost-Effective Serverless Computing on Emerging GPU Architectures.
Proceedings of the 25th International Middleware Conference, 2024

Veiled Pathways: Investigating Covert and Side Channels Within GPU Uncore.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Studying CPU and memory utilization of applications on Fujitsu A64FX and Nvidia Grace Superchip.
Proceedings of the International Symposium on Memory Systems, 2024

GameStreamSR: Enabling Neural-Augmented Game Streaming on Commodity Mobile Platforms.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Paldia: Enabling SLO-Compliant and Cost-Effective Serverless Computing on Heterogeneous Hardware.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Minimizing Coherence Errors via Dynamic Decoupling.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Impact of Write-Allocate Elimination on Fujitsu A64FX.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2024

Usas: A Sustainable Continuous-Learning' Framework for Edge Servers.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

An Autonomic Resource Allocating SSD.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

FAAStloop: Optimizing Loop-Based Applications for Serverless Computing.
Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

SmartGraph: A Framework for Graph Processing in Computational Storage.
Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

Quantifying and Mitigating Cache Side Channel Leakage with Differential Set.
Proc. ACM Program. Lang., October, 2023

MBFGraph: An SSD-Based Analytics System for Evolving Graphs (SC'23) Docker image.
Dataset, June, 2023

Quantum Circuit Resizing.
CoRR, 2023

Quantifying the impact of data replication on error propagation.
Clust. Comput., 2023

MBFGraph: An SSD-based External Graph System for Evolving Graphs.
Proceedings of the International Conference for High Performance Computing, 2023

TRIM: crossTalk-awaRe qubIt Mapping for multiprogrammed quantum systems.
Proceedings of the IEEE International Conference on Quantum Software, 2023

Hardware Support for Constant-Time Programming.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

License Forecasting and Scheduling for HPC.
Proceedings of the 31st International Symposium on Modeling, 2023

Federated Learning with Spiking Neural Networks in Heterogeneous Systems.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2023

Optimizing CPU Performance for Recommendation Systems At-Scale.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

EdgePC: Efficient Deep Learning Analytics for Point Clouds on Edge Devices.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Quantization for Bayesian Deep Learning: Low-Precision Characterization and Robustness.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Stash: A Comprehensive Stall-Centric Characterization of Public Cloud VMs for Distributed Deep Learning.
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023

Data Recomputation for Multithreaded Applications.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Architecture-Aware Currying.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

MicroBlend: An Automated Service-Blending Framework for Microservice-Based Cloud Applications.
Proceedings of the 16th IEEE International Conference on Cloud Computing, 2023

SCOOP: A Scalable Object-Oriented Serverless Platform.
Proceedings of the 16th IEEE International Conference on Cloud Computing, 2023

Studying error propagation on application data structure and hardware.
J. Supercomput., 2022

Memory Space Recycling.
Proc. ACM Meas. Anal. Comput. Syst., 2022

Data Convection: A GPU-Driven Case Study for Thermal-Aware Data Placement in 3D DRAMs.
Proc. ACM Meas. Anal. Comput. Syst., 2022

End-to-end Characterization of Game Streaming Applications on Mobile Platforms.
Proc. ACM Meas. Anal. Comput. Syst., 2022

Predicting Protein-Ligand Docking Structure with Graph Neural Network.
J. Chem. Inf. Model., 2022

Analysis of Distributed Deep Learning in the Cloud.
CoRR, 2022

Seeker: Synergizing Mobile and Energy Harvesting Wearable Sensors for Human Activity Recognition.
CoRR, 2022

Cocktail: A Multidimensional Optimization for Model Serving in Cloud.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

Multi-resource fair allocation for consolidated flash-based caching systems.
Proceedings of the Middleware '22: 23rd International Middleware Conference, Quebec, QC, Canada, November 7, 2022

Skipper: Enabling efficient SNN training through activation-checkpointing and time-skipping.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

An architecture interface and offload model for low-overhead, near-data, distributed accelerators.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Pushing Point Cloud Compression to the Edge.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Learn Locally, Correct Globally: A Distributed Algorithm for Training Graph Neural Networks.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Exploiting Frame Similarity for Efficient Inference on Edge Devices.
Proceedings of the 42nd IEEE International Conference on Distributed Computing Systems, 2022

Fine-Granular Computation and Data Layout Reorganization for Improving Locality.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

A Scheduling Framework for Decomposable Kernels on Energy Harvesting IoT Edge Nodes.
Proceedings of the GLSVLSI '22: Great Lakes Symposium on VLSI 2022, Irvine CA USA, June 6, 2022

GraphGuess: Approximate Graph Processing System with Adaptive Correction.
Proceedings of the Euro-Par 2022: Parallel Processing, 2022

Paralellism-Based Techniques for Slowing Down Soft Error Propagation.
Proceedings of the IEEE Intl. Conf. on Dependable, 2022

Cypress: input size-sensitive container provisioning and request scheduling for serverless platforms.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

Splice: An Automated Framework for Cost-and Performance-Aware Blending of Cloud Services.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

SandPiper: A Cost-Efficient Adaptive Framework for Online Recommender Systems.
Proceedings of the IEEE International Conference on Big Data, 2022

Athena: An Early-Fetch Architecture to Reduce on-Chip Page Walk Latencies.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

MaxTracker: Continuously Tracking the Maximum Computation Progress for Energy Harvesting ReRAM-based CNN Accelerators.
ACM Trans. Embed. Comput. Syst., 2021

Mix and Match: Reorganizing Tasks for Enhancing Data Locality.
Proc. ACM Meas. Anal. Comput. Syst., 2021

SpecSafe: detecting cache side channels in a speculative world.
Proc. ACM Program. Lang., 2021

Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs.
CoRR, 2021

Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud.
CoRR, 2021

Compiler support for near data computing.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Distance-in-time versus distance-in-space.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

Fluid: a framework for approximate concurrency via controlled dependency relaxation.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

PowerPrep: A power management proposal for user-facing datacenter workloads.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

Increasing GPU Translation Reach by Leveraging Under-Utilized On-Chip Resources.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

HoloAR: On-the-fly Optimization of 3D Holographic Processing for Augmented Reality.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

GYAN: Accelerating Bioinformatics Tools in Galaxy with GPU-Aware Computation Mapping.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Cross-Platform Performance Evaluation of Stateful Serverless Workflows.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

GSSA: A Resource Allocation Scheme Customized for 3D NAND SSDs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Origin: Enabling On-Device Intelligence for Human Activity Recognition Using Energy Harvesting Wireless Sensor Networks.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Morphable Convolutional Neural Network for Biomedical Image Segmentation.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Ghost Thread: Effective User-Space Cache Side Channel Protection.
Proceedings of the CODASPY '21: Eleventh ACM Conference on Data and Application Security and Privacy, 2021

Kraken: Adaptive Container Provisioning for Deploying Dynamic DAGs in Serverless Platforms.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

Prolonging 3D NAND SSD lifetime via read latency relaxation.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Design of a Host Interface Logic for GC-Free SSDs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Optimization of Intercache Traffic Entanglement in Tagless Caches With Tiling Opportunities.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

DSM: A Case for Hardware-Assisted Merging of DRAM Rows with Same Content.
Proc. ACM Meas. Anal. Comput. Syst., 2020

Centaur: A Novel Architecture for Reliable, Low-Wear, High-Density 3D NAND Storage.
Proc. ACM Meas. Anal. Comput. Syst., 2020

Guiding Conventional Protein-Ligand Docking Software with Convolutional Neural Networks.
J. Chem. Inf. Model., 2020

Fifer: Tackling Underutilization in the Serverless Era.
CoRR, 2020

Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud.
CoRR, 2020

FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack.
IEEE Comput. Archit. Lett., 2020

Selective Caching: Avoiding Performance Valleys in Massively Parallel Architectures.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

GCN meets GPU: Decoupling "When to Sample" from "How to Sample".
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

SplitServe: Efficiently Splitting Apache Spark Jobs Across FaaS and IaaS.
Proceedings of the Middleware '20: 21st International Middleware Conference, 2020

Fifer: Tackling Resource Underutilization in the Serverless Era.
Proceedings of the Middleware '20: 21st International Middleware Conference, 2020

Implications of Public Cloud Resource Heterogeneity for Inference Serving.
Proceedings of the WoSC@Middleware 2020: Proceedings of the 2020 Sixth International Workshop on Serverless Computing, 2020

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

Alleviating Bottlenecks for DNN Execution on GPUs via Opportunistic Computing.
Proceedings of the 21st International Symposium on Quality Electronic Design, 2020

Déjà View: Spatio-Temporal Compute Reuse for' Energy-Efficient 360° VR Video Streaming.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Selective Event Processing for Energy Efficient Mobile Gaming with SNIP.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Characterizing Bottlenecks in Scheduling Microservices on Serverless Platforms.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

ResiRCA: A Resilient Energy Harvesting ReRAM Crossbar-Based Accelerator for Intelligent Embedded Processors.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

Fair Write Attribution and Allocation for Consolidated Flash Cache.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

Enhancing Address Translations in Throughput Processors via Compression.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Collective Affinity Aware Computation Mapping.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Affine Modeling of Program Traces.
IEEE Trans. Computers, 2019

Architecture-Aware Approximate Computing.
Proc. ACM Meas. Anal. Comput. Syst., 2019

Scheduling opportunities for asymmetrically reliable caches.
J. Parallel Distributed Comput., 2019

A caching system with object sharing.
CoRR, 2019

CaSym: Cache Aware Symbolic Execution for Side Channel Detection and Mitigation.
Proceedings of the 2019 IEEE Symposium on Security and Privacy, 2019

Co-optimizing memory-level parallelism and cache-level parallelism.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge Devices.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

CASH: compiler assisted hardware design for improving DRAM energy efficiency in CNN inference.
Proceedings of the International Symposium on Memory Systems, 2019

Opportunistic computing in GPU architectures.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Understanding Energy Efficiency in IoT App Executions.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Fair Resource Allocation in Consolidated Flash Systems.
Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems, 2019

Architecture-Centric Bottleneck Analysis for Deep Neural Network Applications.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Effect of Distributed Directories in Mesh Interconnects.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Kube-Knots: Resource Harvesting through Dynamic Container Orchestration in GPU-based Datacenters.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

SpIitServe: Efficiently Splitting Complex Workloads Across FaaS and IaaS.
Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019

SOML Read: Rethinking the Read Operation Granularity of 3D NAND SSDs.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

Spock: Exploiting Serverless Functions for SLO and Cost Aware Resource Procurement in Public Cloud.
Proceedings of the 12th IEEE International Conference on Cloud Computing, 2019

Stochastic Modeling and Optimization of Stragglers.
IEEE Trans. Cloud Comput., 2018

Performance and Power-Efficient Design of Dense Non-Volatile Cache in CMPs.
IEEE Trans. Computers, 2018

ReveNAND: A Fast-Drift-Aware Resilient 3D NAND Flash Design.
ACM Trans. Archit. Code Optim., 2018

Quantifying Data Locality in Dynamic Parallelism in GPUs.
Proc. ACM Meas. Anal. Comput. Syst., 2018

Computing with Near Data.
Proc. ACM Meas. Anal. Comput. Syst., 2018

IAA: Incidental Approximate Architectures for Extremely Energy-Constrained Energy Harvesting Scenarios using IoT Nonvolatile Processors.
IEEE Micro, 2018

Architectural exploration of heterogeneous memory systems.
CoRR, 2018

Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance.
CoRR, 2018

Data access skipping for recursive partitioning methods.
Comput. Lang. Syst. Struct., 2018

SimpleSSD: Modeling Solid State Drives for Holistic System Simulation.
IEEE Comput. Archit. Lett., 2018

A Learning-Guided Hierarchical Approach for Biomedical Image Segmentation.
Proceedings of the 31st IEEE International System-on-Chip Conference, 2018

Enhancing computation-to-core assignment with physical location information.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

CritICs Critiquing Criticality in Mobile Apps.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

CHAMELEON: A Dynamically Reconfigurable Heterogeneous Memory System.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Amber*: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD Resources.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

MDACache: Caching for Multi-Dimensional-Access Memories.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Invalid Data-Aware Coding to Enhance the Read Performance of High-Density Flash Memories.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

CachedGC: Cache-Assisted Garbage Collection in Modern Solid State Drives.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Quantifying and Optimizing Data Access Parallelism on Manycores.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Tolerating Write Disturbance Errors in PCM: Experimental Characterization, Analysis, and Mechanisms.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Content Popularity-Based Selective Replication for Read Redirection in SSDs.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures.
Proceedings of the 19th International Symposium on Quality Electronic Design, 2018

Hybrid-comp: A criticality-aware compressed last-level cache.
Proceedings of the 19th International Symposium on Quality Electronic Design, 2018

Reviving Zombie Pages on SSDs.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Parallelizing garbage collection with I/O to improve flash resource utilization.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Parallel Read Partitioning for Concurrent Assembly of Metagenomic Data.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs.
Proceedings of the 16th USENIX Conference on File and Storage Technologies, 2018

Soft Error Characterization on Scientific Applications.
Proceedings of the 2018 IEEE 16th Intl Conf on Dependable, 2018

FLOSS: FLOw sensitive scheduling on mobile platforms.
Proceedings of the 55th Annual Design Automation Conference, 2018

The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters.
Proceedings of the ACM Symposium on Cloud Computing, 2018

NEOFog: Nonvolatility-Exploiting Optimizations for Fog Computing.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

HL-PCM: MLC PCM Main Memory with Accelerated Read.
IEEE Trans. Parallel Distributed Syst., 2017

Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures.
IEEE Trans. Computers, 2017

Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory.
Proc. ACM Meas. Anal. Comput. Syst., 2017

A selective protection scheme of applications using asymmetrically reliable caches.
J. Syst. Archit., 2017

Optimizing energy consumption in GPUS through feedback-driven CTA scheduling.
Proceedings of the 25th High Performance Computing Symposium, Virginia Beach, VA, USA, April 23, 2017

A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems.
Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, Urbana-Champaign, IL, USA, June 05, 2017

Compiler-Enhanced Reliability for Network-on-Chip Architectures.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

Race-to-sleep + content caching + display caching: a recipe for energy-efficient video streaming on handhelds.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Data movement aware computation partitioning.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Incidental computing on IoT nonvolatile processors.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

REMAP: a reliability/endurance mechanism for advancing PCM.
Proceedings of the International Symposium on Memory Systems, 2017

DEMM: A Dynamic Energy-Saving Mechanism for Multicore Memories.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Quantifying the Potential Benefits of On-chip Near-Data Computing in Manycore Processors.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Characterizing diverse handheld apps for customized hardware acceleration.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

TraceTracker: Hardware/software co-evaluation for large-scale I/O workload reconstruction.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Congestion-aware memory management on NUMA platforms: A VMware ESXi case study.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters.
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017

A Scale-Out Enterprise Storage Architecture.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Leveraging value locality for efficient design of a hybrid cache in multicore processors.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Controlled Kernel Launch for Dynamic Parallelism in GPUs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Exploring the Potential for Collaborative Data Compression and Hard-Error Tolerance in PCM Memories.
Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2017

Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Exploiting Intra-Request Slack to Improve SSD Performance.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

POSTER: Location-Aware Computation Mapping for Manycore Processors.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

NANDFlashSim: High-Fidelity, Microarchitecture-Aware NAND Flash Memory Simulation.
ACM Trans. Storage, 2016

Memory Partitioning in the Limit.
Int. J. Parallel Program., 2016

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.
CoRR, 2016

Asymmetrically reliable caches for multicore architectures under performance and energy constraints.
Clust. Comput., 2016

Exploiting Core Criticality for Enhanced GPU Performance.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

Exploring the potentials of parallel garbage collection in SSDs for enterprise storage systems.
Proceedings of the International Conference for High Performance Computing, 2016

An in-depth study of next generation interface for emerging non-volatile memories.
Proceedings of the 5th Non-Volatile Memory Systems and Applications Symposium, 2016

Improving bank-level parallelism for irregular applications.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Storage consolidation: Not always a panacea, but can we ease the pain?
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

MLC PCM main memory with accelerated read.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Boosting Access Parallelism to PCM-Based Main Memory.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Cache-Aware Approximate Computing for Decision Tree Learning.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

HCW 2016 Keynote Talk.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Trace-based affine reconstruction of codes.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Protecting Code Regions on Asymmetrically Reliable Caches.
Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

μC-States: Fine-grained GPU Datapath Power Management.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

IOPro: a parallel I/O profiling and visualization framework for high-performance storage systems.
J. Supercomput., 2015

EECache: A Comprehensive Study on the Architectural Design for Energy-Efficient Last-Level Caches in Chip Multiprocessors.
ACM Trans. Archit. Code Optim., 2015

Thermal-Aware Application Scheduling on Device-Heterogeneous Embedded Architectures.
Proceedings of the 28th International Conference on VLSI Design, 2015

Memory Row Reuse Distance and its Role in Optimizing Application Performance.
Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2015

Optimizing off-chip accesses in multicores.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Anatomy of GPU Memory System for Multi-Application Execution.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Performance and energy evaluation of data prefetching on intel Xeon Phi.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

VIP: virtualizing IP chains on handheld platforms.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Performance and Energy Efficient Asymmetrically Reliable Caches for Multicore Architectures.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Evaluating the Combined Impact of Node Architecture and Cloud Workload Characteristics on Network Traffic and Performance/Cost.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Machine learning techniques for improved data prefetching.
Proceedings of the 5th International Conference on Energy Aware Computing Systems & Applications, 2015

Phase Detection with Hidden Markov Models for DVFS on Many-Core Processors.
Proceedings of the 35th IEEE International Conference on Distributed Computing Systems, 2015

Domain knowledge based energy management in handhelds.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Network footprint reduction through data access and computation placement in NoC-based manycores.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Reactive tiling.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

TaPEr: tackling power emergencies in the dark silicon era by exploiting resource scalability.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain?
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Exploiting Staleness for Approximating Loads on CMPs.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy.
ACM Trans. Archit. Code Optim., 2014

Exploring the future of out-of-core computing with compute-local non-volatile memory.
Sci. Program., 2014

Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique.
Eng. Comput., 2014

GemDroid: a framework to evaluate mobile platforms.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

CApRI: CAche-conscious data reordering for irregular codes.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

Short-Circuiting Memory Traffic in Handheld Platforms.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Managing GPU Concurrency in Heterogeneous Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Compiler Support for Optimizing Memory Bank-Level Parallelism.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity.
Proceedings of the IEEE 22nd International Symposium on Modelling, 2014

Quantifying and Optimizing the Impact of Victim Cache Line Selection in Manycore Systems.
Proceedings of the IEEE 22nd International Symposium on Modelling, 2014

EECache: exploiting design choices in energy-efficient last-level caches for chip multiprocessors.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

HIOS: A host interface I/O scheduler for Solid State Disks.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

A cache topology-aware multi-query scheduler for multicore architectures.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

QoS aware dynamic time-slice tuning.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage.
Proceedings of the IEEE 34th International Conference on Distributed Computing Systems, 2014

Sprinkler: Maximizing resource utilization in many-chip solid state disks.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

Trading cache hit rate for memory performance.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

Compiler-Directed Energy Reduction Using Dynamic Voltage Scaling and Voltage Islands for Embedded Systems.
IEEE Trans. Computers, 2013

Compiler-directed file layout optimization for hierarchical storage systems.
Sci. Program., 2013

Steep-Slope Devices: From Dark to Dim Silicon.
IEEE Micro, 2013

Examining Thread Vulnerability analysis using fault-injection.
Proceedings of the 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, 2013

Revisiting widely held SSD expectations and rethinking system-level implications.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2013

Data layout optimization for GPGPU architectures.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Evaluating STT-RAM as an energy-efficient main memory alternative.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Orchestrated scheduling and prefetching for GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Interference Resolver in Shared Storage Systems to Provide Fairness to I/O Intensive Applications.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Disk-Cache and Parallelism Aware I/O Scheduling to Improve Storage System Performance.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Design of a large-scale storage-class RRAM system.
Proceedings of the International Conference on Supercomputing, 2013

Challenges in Getting Flash Drives Closer to CPU.
Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, 2013

Locality-aware mapping and scheduling for multicores.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

Meeting midway: Improving CMP performance with memory-side prefetching.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Neither more nor less: Optimizing thread-level parallelism for GPGPUs.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Reshaping cache misses to improve row-buffer locality in multicore systems.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Reliability-aware core partitioning in chip multiprocessors.
J. Syst. Archit., 2012

Thread vulnerability in parallel applications.
J. Parallel Distributed Comput., 2012

Automatic Parallel Code Generation for NUFFT Data Translation on multicores.
J. Circuits Syst. Comput., 2012

REEact: a customizable virtual execution manager for multicore platforms.
Proceedings of the 8th International Conference on Virtual Execution Environments, 2012

IOPin: Runtime Profiling of Parallel I/O in HPC Systems.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

An Evolutionary Path to Object Storage Access.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A compiler framework for extracting superword level parallelism.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012

Locality-Aware Dynamic Mapping for Multithreaded Applications.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

NANDFlashSim: Intrinsic latency variation aware NAND flash memory system modeling and simulation at microarchitecture level.
Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies, 2012

Taking Garbage Collection Overheads Off the Critical Path in SSDs.
Proceedings of the Middleware 2012, 2012

Addressing End-to-End Memory Access Latency in NoC-Based Multicores.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Design space exploration of workload-specific last-level caches.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

Physically Addressed Queueing (PAQ): Improving parallelism in Solid State Disks.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Software-Directed Data Access Scheduling for Reducing Disk Energy Consumption.
Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems, 2012

Improving last level cache locality by integrating loop and data transformations.
Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design, 2012

An Evaluation of Different Page Allocation Strategies on High-Speed SSDs.
Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems, 2012

Performance-reliability tradeoff analysis for multithreaded applications.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

A hybrid NoC design for cache coherence optimization for chip multiprocessors.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Courteous cache sharing: being nice to others in capacity management.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

Panacea: towards holistic optimization of MapReduce applications.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Reuse distance based performance modeling and workload mapping.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

Improving the performance of k-means clustering through computation skipping and data locality optimizations.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

On Urgency of I/O Operations.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

PEPON: performance-aware hierarchical power budgeting for NoC based multicores.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Application-aware prefetch prioritization in on-chip networks.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Off-chip access localization for NoC-based multicores.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

MROrchestrator: A Fine-Grained Resource Orchestration Framework for MapReduce Clusters.
Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

Communication Based Proactive Link Power Management.
Trans. High Perform. Embed. Archit. Compil., 2011

BrickX: building hybrid systems for recursive computations.
SIGMETRICS Perform. Evaluation Rev., 2011

Particle simulation on the Cell BE architecture.
Clust. Comput., 2011

Studying inter-core data reuse in multicores.
Proceedings of the SIGMETRICS 2011, 2011

METE: meeting end-to-end QoS in multicores through system-wide resource management.
Proceedings of the SIGMETRICS 2011, 2011

Virtual I/O caching: dynamic storage cache management for concurrent workloads.
Proceedings of the Conference on High Performance Computing Networking, 2011

QoS aware storage cache management in multi-server environments.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

Automatic Feedback Control of Shared Hybrid Caches in 3D Chip Multiprocessors.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

Quantifying Thread Vulnerability for Multicore Architectures.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

A data layout optimization framework for NUCA-based multicores.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Reducing memory interference in multicore systems via application-aware memory channel partitioning.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Exploring performance-power tradeoffs in providing reliability for NoC-based MPSoCs.
Proceedings of the 12th International Symposium on Quality Electronic Design, 2011

Minimizing interference through application mapping in multi-level buffer caches.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores.
Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011

Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines.
Proceedings of the 2011 International Conference on Distributed Computing Systems, 2011

Feedback control based cache reliability enhancement for emerging multicores.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Exploring heterogeneous NoC design space.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Cooperative parallelization.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Improving shared cache behavior of multithreaded object-oriented applications in multicores.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Optimizing data locality using array tiling.
Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

Software-directed data access scheduling for reducing disk energy consumption.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Bandwidth Constrained Coordinated HW/SW Prefetching for Multicores.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Multilayer Cache Partitioning for Multiprogram Workloads.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Process variation-aware routing in NoC based multicores.
Proceedings of the 48th Design Automation Conference, 2011

A helper thread based dynamic cache partitioning scheme for multithreaded applications.
Proceedings of the 48th Design Automation Conference, 2011

On-chip cache hierarchy-aware tile scheduling for multicore machines.
Proceedings of the CGO 2011, 2011

Neighborhood-aware data locality optimization for NoC-based multicores.
Proceedings of the CGO 2011, 2011

Adaptive QoS Decomposition and Control for Storage Cache Management in Multi-server Environments.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

APP: Minimizing Interference Using Aggressive Pipelined Prefetching in Multi-level Buffer Caches.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

Optimizing Data Layouts for Parallel Computation on Multicores.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Compiler Directed Data Locality Optimization for Multicore Architectures.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Total Power Optimization for Combinational Logic Using Genetic Algorithms.
J. Signal Process. Syst., 2010

On-chip memory space partitioning for chip multiprocessors using polyhedral algebra.
IET Comput. Digit. Tech., 2010

Exploiting large on-chip memory space through data recomputation.
Proceedings of the Annual IEEE International SoC Conference, SoCC 2010, 2010

Coordinated power management of voltage islands in CMPs.
Proceedings of the SIGMETRICS 2010, 2010

CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors.
Proceedings of the Conference on High Performance Computing Networking, 2010

Automated Tracing of I/O Stack.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Intra-application shared cache partitioning for multithreaded applications.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Cache topology aware computation mapping for multicores.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Compiler directed network-on-chip reliability enhancement for chip multiprocessors.
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, 2010

Intra-application cache partitioning.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

T-NUCA - a novel approach to non-uniform access latency cache architectures for 3D CMPs.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Analyzing the soft error resilience of linear solvers on multicore multiprocessors.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Dynamic core partitioning for energy efficiency.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Adaptive multi-level cache allocation in distributed storage architectures.
Proceedings of the 24th International Conference on Supercomputing, 2010

Cashing in on hints for better prefetching and caching in PVFS and MPI-IO.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Computation mapping for multi-level storage cache hierarchies.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

SRP: Symbiotic Resource Partitioning of the Memory Hierarchy in CMPs.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Scalable Parallelization Strategies to Accelerate NuFFT Data Translation on Multicores.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Code Scheduling for Optimizing Parallelism and Data Locality.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

A special-purpose compiler for look-up table and code generation for function evaluation.
Proceedings of the Design, Automation and Test in Europe, 2010

Feedback control for providing QoS in NoC based multicores.
Proceedings of the Design, Automation and Test in Europe, 2010

Reducing memory requirements of resource-constrained applications.
ACM Trans. Embed. Comput. Syst., 2009

Compiler-assisted soft error detection under performance and energy constraints in embedded systems.
ACM Trans. Embed. Comput. Syst., 2009

Using Data Compression for Increasing Memory System Utilization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2009

Process-Variation-Aware Adaptive Cache Architecture and Management.
IEEE Trans. Computers, 2009

An Automated Framework for Accelerating Numerical Algorithms on Reconfigurable Platforms Using Algorithmic/Architectural Optimization.
IEEE Trans. Computers, 2009

Adapting application execution in CMPs using helper threads.
J. Parallel Distributed Comput., 2009

Shared scratch pad memory space management across applications.
Int. J. Embed. Syst., 2009

Clone Detection in Sensor Networks with <i>Ad Hoc</i> and Grid Topologies.
Int. J. Distributed Sens. Networks, 2009

A case for integrated processor-cache partitioning in chip multiprocessors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Dynamic storage cache allocation in multi-server architectures.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

A hardware-software codesign strategy for Loop intensive applications.
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

A compiler-directed data prefetching scheme for chip multiprocessors.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

SHARP control: controlled shared cache management in chip multiprocessors.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Optimizing shared cache behavior of chip multiprocessors.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

In-Network Caching for Chip Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Adapting Application Mapping to Systematic Within-Die Process Variations on Chip Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Hybrid Techniques for Fast Multicore Simulation.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Exploring parallelization strategies for NUFFT data translation.
Proceedings of the 9th ACM & IEEE International conference on Embedded software, 2009

Using dynamic compilation for continuing execution under reduced memory availability.
Proceedings of the Design, Automation and Test in Europe, 2009

Adaptive prefetching for shared cache based chip multiprocessors.
Proceedings of the Design, Automation and Test in Europe, 2009

Process variation aware thread mapping for Chip Multiprocessors.
Proceedings of the Design, Automation and Test in Europe, 2009

Dynamic thread and data mapping for NoC based CMPs.
Proceedings of the 46th Design Automation Conference, 2009

Improving I/O performance using soft-QoS-based dynamic storage cache partitioning.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

MPISec I/O: Providing Data Confidentiality in MPI-I/O.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Markov Model Based Disk Power Management for Data Intensive Workloads.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Slicing based code parallelization for minimizing inter-processor communication.
Proceedings of the 2009 International Conference on Compilers, 2009

Topology-Aware I/O Caching for Shared Storage Systems.
Proceedings of the 22nd International Conference on Parallel and Distributed Computing and Communication Systems, 2009

Power Aware Disk Allocation.
Proceedings of the 22nd International Conference on Parallel and Distributed Computing and Communication Systems, 2009

Dynamic Storage Cache Partitioning Using Feedback Control Theory.
Proceedings of the 22nd International Conference on Parallel and Distributed Computing and Communication Systems, 2009

Designing a 3-D FPGA: Switch Box Architecture and Thermal Issues.
IEEE Trans. Very Large Scale Integr. Syst., 2008

Compiler-Directed Code Restructuring for Improving Performance of MPSoCs.
IEEE Trans. Parallel Distributed Syst., 2008

Access pattern-based code compression for memory-constrained systems.
ACM Trans. Design Autom. Electr. Syst., 2008

ILP-Based energy minimization techniques for banked memories.
ACM Trans. Design Autom. Electr. Syst., 2008

Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO.
ACM SIGOPS Oper. Syst. Rev., 2008

Capturing and optimizing the interactions between prefetching and cache line turnoff.
Microprocess. Microsystems, 2008

Graphical Mission Specification and Partitioning for Unmanned Underwater Vehicles.
J. Softw., 2008

Comput. Lang. Syst. Struct., 2008

Implementation and evaluation of a migration-based NUCA design for chip multiprocessors.
Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2008

Software-directed combined cpu/link voltage scaling fornoc-based cmps.
Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2008

Prefetch throttling and data pinning for improving performance of shared caches.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

A novel migration-based NUCA design for chip multiprocessors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Enhancing the performance of MPI-IO applications by overlapping I/O, computation and communication.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

A Scratch-Pad Memory Aware Dynamic Loop Scheduling Algorithm.
Proceedings of the 9th International Symposium on Quality of Electronic Design (ISQED 2008), 2008

Evaluating the role of scratchpad memories in chip multiprocessors for sparse matrix computations.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Improving I/O performance through compiler-directed code restructuring and adaptive prefetching.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Managing power, performance and reliability trade-offs.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Towards energy efficient scaling of scientific codes.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A helper thread based EDP reduction scheme for adapting application execution in CMPs.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Ring data location prediction scheme for Non-Uniform Cache Architectures.
Proceedings of the 26th International Conference on Computer Design, 2008

SPM management using Markov chain based data access prediction.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

Integrated code and data placement in two-dimensional mesh based chip multiprocessors.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

Improving I/O Performance of Applications through Compiler-Directed Code Restructuring.
Proceedings of the 6th USENIX Conference on File and Storage Technologies, 2008

Application mapping for chip multiprocessors.
Proceedings of the 45th Design Automation Conference, 2008

Adaptive set pinning: managing shared caches in chip multiprocessors.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

A Systematic Approach to Automatically Generate Multiple Semantically Equivalent Program Versions.
Proceedings of the Reliable Software Technologies, 2008

Profiler and compiler assisted adaptive I/O prefetching for shared storage caches.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

Reliability-aware Co-synthesis for Embedded Systems.
J. VLSI Signal Process., 2007

On the Detection of Clones in Sensor Networks Using Random Key Predistribution.
IEEE Trans. Syst. Man Cybern. Part C, 2007

Compiler-Directed Energy Optimization for Parallel Disk Based Systems.
IEEE Trans. Parallel Distributed Syst., 2007

Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling.
J. Supercomput., 2007

A Prefetching Algorithm for Multi-speed Disks.
Trans. High Perform. Embed. Archit. Compil., 2007

An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors.
Trans. High Perform. Embed. Archit. Compil., 2007

Solving the Register Allocation Problem for Embedded Systems Using a Hybrid Evolutionary Algorithm.
IEEE Trans. Evol. Comput., 2007

Reducing Data TLB Power via Compiler-Directed Address Generation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

Design of power-aware FPGA fabrics.
Int. J. Embed. Syst., 2007

Optimising power efficiency in trace cache fetch unit.
IET Comput. Digit. Tech., 2007

A Constraint Network Based Approach to Memory Layout Optimization
CoRR, 2007

Compiler-Directed Code Restructuring for Operating with Compressed Arrays.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

Locality-Aware Distributed Loop Scheduling for Chip Multiprocessors.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

A Process Scheduler-Based Approach to NoC Power Management.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

Enhancing Locality in Two-Dimensional Space through Integrated Computation and Data Mappings.
Proceedings of the 20th International Conference on VLSI Design (VLSI Design 2007), 2007

Securing Disk-Resident Data through Application Level Encryption.
Proceedings of the Fourth International IEEE Security in Storage Workshop, 2007

Efficient Function Evaluations with Lookup Tables for Structured Matrix Operations.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2007

Modeling and improving data cache reliability.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Profile-driven energy reduction in network-on-chips.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

Compiler-directed application mapping for NoC based chip multiprocessors.
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

An ilp based approach to reducing energy consumption in nocbased CMPS.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Phase-aware adaptive hardware selection for power-efficient scientific computations.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Improving disk reuse for reducing power consumption.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Improving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Data locality enhancement for CMPs.
Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

TANOR: A Tool for Accelerating N-Body Simulations on Reconfigurable Platforms.
Proceedings of the FPL 2007, 2007

Performance aware secure code partitioning.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Memory bank aware dynamic loop scheduling.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

A Memory-Conscious Code Parallelization Scheme.
Proceedings of the 44th Design Automation Conference, 2007

Reducing Off-Chip Memory Access Costs Using Data Recomputation in Embedded Chip Multi-processors.
Proceedings of the 44th Design Automation Conference, 2007

Runtime system support for software-guided disk power management.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

Integrated Data Reorganization and Disk Mapping for Reducing Disk Energy Consumption.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms.
Proceedings of the 2007 International Conference on Compilers, 2007

Automated Mission Parallelization for Unmanned Underwater Vehicles.
Proceedings of the Regarding the Intelligence in Distributed Intelligent Systems, 2007

Energy-Optimal Data Collection and Communication Using a Group of UUVs.
Proceedings of the Regarding the Intelligence in Distributed Intelligent Systems, 2007

Reducing Energy Consumption of On-Chip Networks Through a Hybrid Compiler-Runtime Approach.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Ring Prediction for Non-Uniform Cache Architectures.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Estimating and reducing the memory requirements of signal processing codes for embedded systems.
IEEE Trans. Signal Process., 2006

Multicollective I/O: A technique for exploiting inter-file access patterns.
ACM Trans. Storage, 2006

Improving the energy behavior of block buffering using compiler optimizations.
ACM Trans. Design Autom. Electr. Syst., 2006

Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality.
ACM Trans. Design Autom. Electr. Syst., 2006

Reducing dynamic and leakage energy in VLIW architectures.
ACM Trans. Embed. Comput. Syst., 2006

Reducing code size through address register assignment.
ACM Trans. Embed. Comput. Syst., 2006

Reducing memory energy consumption of embedded applications that process dynamically allocated data.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2006

Optimizing bus energy consumption of on-chip multiprocessors using frequent values.
J. Syst. Archit., 2006

The Sleep Deprivation Attack in Sensor Networks: Analysis and Methods of Defense.
Int. J. Distributed Sens. Networks, 2006

Discretionary Caching for I/O on Clusters.
Clust. Comput., 2006

Geometric Tiling for Reducing Power Consumption in Structured Matrix Operations.
Proceedings of the 2006 IEEE International SOC Conference, Austin, Texas, USA, 2006

Energy-Aware Code Replication for Improving Reliability in Embedded Chip Multiprocessors.
Proceedings of the 2006 IEEE International SOC Conference, Austin, Texas, USA, 2006

Compiler Support for Voltage Islands.
Proceedings of the 2006 IEEE International SOC Conference, Austin, Texas, USA, 2006

Compiler-directed channel allocation for saving power in on-chip networks.
Proceedings of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2006

Reducing NoC energy consumption through compiler-directed channel voltage scaling.
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006

Compiler-directed thermal management for VLIW functional units.
Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, 2006

An Integer Linear Programming Based Approach to Simultaneous Memory Space Partitioning and Data Allocation for Chip Multiprocessors.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Exploiting Software Pipelining for Network-on-Chip architectures.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Reducing Memory Requirements through Task Recomputation in Embedded Multi-CPU Systems.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Leakage-Aware SPM Management.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Compiler-Directed Management of Leakage Power in Software-Managed Memories.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Shared Scratch-Pad Memory Space Management.
Proceedings of the 7th International Symposium on Quality of Electronic Design (ISQED 2006), 2006

Data Replication in Banked DRAMs for Reducing Energy Consumption.
Proceedings of the 7th International Symposium on Quality of Electronic Design (ISQED 2006), 2006

Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs.
Proceedings of the 7th International Symposium on Quality of Electronic Design (ISQED 2006), 2006

Minimizing energy consumption of banked memories using data recomputation.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

Reducing power through compiler-directed barrier synchronization elimination.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

An ILP Formulation for Task Scheduling on Heterogeneous Chip Multiprocessors.
Proceedings of the Computer and Information Sciences, 2006

Design and Management of 3D Chip Multiprocessors Using Network-in-Memory.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Integrated link/CPU voltage scaling for reducing energy consumption of parallel sparse matrix applications.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Enhancing L2 organization for CMPs with a center cell.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

SPM Conscious Loop Scheduling for Embedded Chip Multiprocessors.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Multi-Level On-Chip Memory Hierarchy Design for Embedded Chip Multiprocessors.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Cache miss clustering for banked memory systems.
Proceedings of the 2006 International Conference on Computer-Aided Design, 2006

An ILP based approach to address code generation for digital signal processors.
Proceedings of the 16th ACM Great Lakes Symposium on VLSI 2006, Philadelphia, PA, USA, April 30, 2006

Selective code/data migration for reducing communication energy in embedded MpSoC architectures.
Proceedings of the 16th ACM Great Lakes Symposium on VLSI 2006, Philadelphia, PA, USA, April 30, 2006

Switch Box Architectures for Three-Dimensional FPGAs.
Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006

Memory-Conscious Reliable Execution on Embedded Chip Multiprocessors.
Proceedings of the 2006 International Conference on Dependable Systems and Networks (DSN 2006), 2006

Dynamic partitioning of processing and memory resources in embedded MPSoC architectures.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Activity clustering for leakage management in SPMs.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Dynamic scratch-pad memory management for irregular array access patterns.
Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Optimizing code parallelization through a constraint network based approach.
Proceedings of the 43rd Design Automation Conference, 2006

A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

Energy-aware data prefetching for multi-speed disks.
Proceedings of the Third Conference on Computing Frontiers, 2006

Multi-compilation: capturing interactions among concurrently-executing applications.
Proceedings of the Third Conference on Computing Frontiers, 2006

Using Task Recomputation During Application Mapping in Parallel Embedded Architectures.
Proceedings of the 2006 International Conference on Computer Design & Conference on Computing in Nanotechnology, 2006

Reducing dynamic compilation overhead by overlapping compilation and execution.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Energy savings through embedded processing on disk system.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Optimal topology exploration for application-specific 3D architectures.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Compiler-Guided data compression for reducing memory consumption of embedded applications.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Maximizing data reuse for minimizing memory space requirements and execution cycles.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Prefetching-aware cache line turnoff for saving leakage energy.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Object duplication for improving reliability.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Energy-aware computation duplication for improving reliability in embedded chip multiprocessors.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Secure Execution of Computations in Untrusted Hosts.
Proceedings of the Reliable Software Technologies, 2006

Compiler-guided leakage optimization for banked scratch-pad memories.
IEEE Trans. Very Large Scale Integr. Syst., 2005

Soft errors issues in low-power caches.
IEEE Trans. Very Large Scale Integr. Syst., 2005

Optimizing Array-Intensive Applications for On-Chip Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2005

Optimizing instruction TLB energy using software and hardware techniques.
ACM Trans. Design Autom. Electr. Syst., 2005

Reducing data cache leakage energy using a compiler-based approach.
ACM Trans. Embed. Comput. Syst., 2005

Compiler-directed high-level energy estimation and optimization.
ACM Trans. Embed. Comput. Syst., 2005

Data space-oriented tiling for enhancing locality.
ACM Trans. Embed. Comput. Syst., 2005

Analyzing data reuse for cache reconfiguration.
ACM Trans. Embed. Comput. Syst., 2005

A Holistic Approach to Designing Energy-Efficient Cluster Interconnects.
IEEE Trans. Computers, 2005

Improving whole-program locality using intra-procedural and inter-procedural transformations<sup>, </sup>.
J. Parallel Distributed Comput., 2005

An integer linear programming-based tool for wireless sensor networks.
J. Parallel Distributed Comput., 2005

Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation.
J. Parallel Distributed Comput., 2005

Symmetric encryption in reconfigurable and custom hardware.
Int. J. Embed. Syst., 2005

Improving Java performance using dynamic method migration on FPGAs.
Int. J. Embed. Syst., 2005

Exploiting frequent field values in java objects for reducing heap memory requirements.
Proceedings of the 1st International Conference on Virtual Execution Environments, 2005

Constraint-based Code mapping for heterogeneous Chip multiprocessors.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

On-Chip Memory Management for Embedded MpSoC Architectures Based on Data Compression.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

Workload Clustering for Increasing Energy Savings on Embedded MPSoCs.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

Memory Space Conscious Loop Iteration Duplication for Reliable Execution.
Proceedings of the Static Analysis, 12th International Symposium, 2005

An Adaptive Locality-Conscious Process Scheduler for Embedded Systems.
Proceedings of the 11th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2005), 2005

Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Fault Recovery Designs for Processor-Embedded Distributed Storage Architectures with I/O-Intensive DB Workloads.
Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST 2005), 2005

Compiling for memory emergency.
Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, 2005

Dynamic Compilation for Reducing Energy Consumption of I/O-Intensive Applications.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

A Data-Driven Approach for Embedded Security.
Proceedings of the 2005 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2005), 2005

Increasing Data TLB Resilience to Transient Errors.
Proceedings of the 2005 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2005), 2005

Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs.
Proceedings of the 2005 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2005), 2005

An ILP Formulation for Reliability-Oriented High-Level Synthesis.
Proceedings of the 6th International Symposium on Quality of Electronic Design (ISQED 2005), 2005

Reliability-Centric Hardware/Software Co-Design.
Proceedings of the 6th International Symposium on Quality of Electronic Design (ISQED 2005), 2005

Pro-active Page Replacement for Scientific Applications: A Characterization.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Power-aware code scheduling for clusters of active disks.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

An evaluation of code and data optimizations in the context of disk power reduction.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Dataflow analysis for energy-efficient scratch-pad memory management.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Software-Directed Disk Power Management for Scientific Applications.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Exploiting Barriers to Optimize Power Consumption of CMPs.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Power and Performance in I/O for Scientific Applications.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Reducing Power with Performance Constraints for Parallel Sparse Applications.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Reliability-Conscious Process Scheduling under Performance Constraints in FPGA-Based Embedded Systems.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Disk layout optimization for reducing energy consumption.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Temperature-Sensitive Loop Parallelization for Chip Multiprocessors.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Improving scratch-pad memory reliability through compiler-guided data block duplication.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Compiler-directed voltage scaling on communication links for reducing power consumption.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

2D data locality: definition, abstraction, and application.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Integrating loop and data optimizations for locality within a constraint network based framework.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Runtime integrity checking for inter-object connections.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Code restructuring for improving cache performance of MPSoCs.
Proceedings of the 2005 International Conference on Computer-Aided Design, 2005

Using data compression in an MPSoC architecture for improving performance.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Energy management in software-controlled multi-level memory hierarchies.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Integer linear programming based energy optimization for banked DRAMs.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Load elimination for low-power embedded processors.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Exploiting last idle periods of links for network power management.
Proceedings of the EMSOFT 2005, 2005

Optimizing inter-processor data locality on embedded chip multiprocessors.
Proceedings of the EMSOFT 2005, 2005

A Data-Centric Approach to Checksum Reuse for Array-Intensive Applications.
Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN 2005), 28 June, 2005

Reliability-Centric High-Level Synthesis.
Proceedings of the 2005 Design, 2005

Access Pattern-Based Code Compression for Memory-Constrained Embedded Systems.
Proceedings of the 2005 Design, 2005

BB-GC: Basic-Block Level Garbage Collection.
Proceedings of the 2005 Design, 2005

Nonuniform Banking for Reducing Memory Energy Consumption.
Proceedings of the 2005 Design, 2005

Increasing Register File Immunity to Transient Errors.
Proceedings of the 2005 Design, 2005

Studying Storage-Recomputation Tradeoffs in Memory-Constrained Embedded Processing.
Proceedings of the 2005 Design, 2005

Locality-Aware Process Scheduling for Embedded MPSoCs.
Proceedings of the 2005 Design, 2005

Thermal-Aware Task Allocation and Scheduling for Embedded Systems.
Proceedings of the 2005 Design, 2005

Compiler-Directed Instruction Duplication for Soft Error Detection.
Proceedings of the 2005 Design, 2005

A Constraint Network Based Approach to Memory Layout Optimization.
Proceedings of the 2005 Design, 2005

Locality-conscious workload assignment for array-based computations in MPSOC architectures.
Proceedings of the 42nd Design Automation Conference, 2005

Improving java virtual machine reliability for memory-constrained embedded systems.
Proceedings of the 42nd Design Automation Conference, 2005

Increasing on-chip memory space utilization for embedded chip multiprocessors through data compression.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

Optimizing Address Code Generation for Array-Intensive DSP Applications.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

A Compiler-Based Approach to Data Security.
Proceedings of the Compiler Construction, 14th International Conference, 2005

Compiler-directed proactive power management for networks.
Proceedings of the 2005 International Conference on Compilers, 2005

Verifiable annotations for embedded java environments.
Proceedings of the 2005 International Conference on Compilers, 2005

Customized on-chip memories for embedded chip multiprocessors.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Using loop invariants to fight soft errors in data caches.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Using data replication to reduce communication energy on chip multiprocessors.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Studying interactions between prefetching and cache line turnoff.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

FD-HGAC: a hybrid heuristic/genetic algorithm hardware/software co-synthesis framework with fault detection.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Increasing FPGA resilience against soft errors using task duplication.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Compiler-directed selective data protection against soft errors.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Optimizing embedded applications using programmer-inserted hints.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005

Instruction Scheduling for Low Power.
J. VLSI Signal Process., 2004

Compiler-directed scratch pad memory optimization for embedded multiprocessors.
IEEE Trans. Very Large Scale Integr. Syst., 2004

Quasidynamic Layout Optimizations for Improving Data Locality.
IEEE Trans. Parallel Distributed Syst., 2004

Access Pattern Restructuring for Memory Energy.
IEEE Trans. Parallel Distributed Syst., 2004

Studying Energy Trade Offs in Offloading Computation/Compilation in Java-Enabled Mobile Devices.
IEEE Trans. Parallel Distributed Syst., 2004

A compiler-based approach for dynamically managing scratch-pad memories in embedded systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications.
IEEE Trans. Computers, 2004

Reducing instruction cache energy consumption using a compiler-based strategy.
ACM Trans. Archit. Code Optim., 2004

Optimizing Leakage Energy Consumption in Cache Bitlines.
Des. Autom. Embed. Syst., 2004

On the Performance of the POSIX I/O Interface to PVFS.
Proceedings of the 12th Euromicro Workshop on Parallel, 2004

Code protection for resource-constrained embedded devices.
Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, 2004

An ILP-Based Approach to Locality Optimization.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Field level analysis for heap space optimization in embedded java environments.
Proceedings of the 4th International Symposium on Memory Management, 2004

Fault Tolerant Algorithms for Network-On-Chip Interconnect.
Proceedings of the 2004 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2004), 2004

Compiler-directed physical address generation for reducing dTLB power.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Soft error and energy consumption interactions: a data cache perspective.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

A Parallel Architecture for Secure FPGA Symmetric Encryption.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Improving Performance of Java Applications Using a Coprocessor.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Improving Memory Performance of Embedded Java Applications by Dynamic Layout Modifications.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Exploiting Memory Bank Locality in Multiprocessor SoC Architectures.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

A Window-Based Approach to Retrieving Memory-Resident Data for Query Execution.
Proceedings of the 8th International Database Engineering and Applications Symposium (IDEAS 2004), 2004

Improving soft-error tolerance of FPGA configuration bits.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Banked scratch-pad memory management for reducing leakage energy consumption.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Organizing the Last Line of Defense before Hitting the Memory Wall for CMP.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Tuning data replication for improving behavior of MPSoC applications.
Proceedings of the 14th ACM Great Lakes Symposium on VLSI 2004, 2004

A Dual-V<sub>DD</sub> Low Power FPGA Architecture.
Proceedings of the Field Programmable Logic and Application, 2004

Reducing leakage energy in FPGAs using region-constrained placement.
Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004

A Hybrid Evolutionary Algorithm for Solving the Register Allocation Problem.
Proceedings of the Evolutionary Computation in Combinatorial Optimization, 2004

Exploring the Possibility of Operating in the Compressed Domain.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Using Data Compression to Increase Energy Savings in Multi-bank Memories.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Compiler-Guided Code Restructuring for Improving Instruction TLB Energy Behavior.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Data Windows: A Data-Centric Approach for Query Execution in Memory-Resident Databases.
Proceedings of the 2004 Design, 2004

A Crosstalk Aware Interconnect with Variable Cycle Transmission.
Proceedings of the 2004 Design, 2004

Impact of Data Transformations on Memory Bank Locality.
Proceedings of the 2004 Design, 2004

Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors.
Proceedings of the 2004 Design, 2004

Tuning In-Sensor Data Filtering to Reduce Energy Consumption in Wireless Sensor Networks.
Proceedings of the 2004 Design, 2004

Scheduling Reusable Instructions for Power Reduction.
Proceedings of the 2004 Design, 2004

Configuration-Sensitive Process Scheduling for FPGA-Based Computing Platforms.
Proceedings of the 2004 Design, 2004

Data compression for improving SPM behavior.
Proceedings of the 41th Design Automation Conference, 2004

LODS: locality-oriented dynamic scheduling for on-chip multiprocessors.
Proceedings of the 41th Design Automation Conference, 2004

Compiler-directed code restructuring for reducing data TLB energy.
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

Analyzing heap error behavior in embedded JVM environments.
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

Energy management schemes for memory-resident database systems.
Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, 2004

Reducing energy consumption of queries in memory-resident database systems.
Proceedings of the 2004 International Conference on Compilers, 2004

Dynamic on-chip memory management for chip multiprocessors.
Proceedings of the 2004 International Conference on Compilers, 2004

Reducing Energy Consumption in Chip Multiprocessors Using Workload Variations.
Proceedings of the Ultra Low-Power Electronics and Design, 2004

A high-performance application data environment for large-scale scientific computations.
IEEE Trans. Parallel Distributed Syst., 2003

Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework.
IEEE Trans. Parallel Distributed Syst., 2003

Partitioned instruction cache architecture for energy efficiency.
ACM Trans. Embed. Comput. Syst., 2003

Evaluating Integrated Hardware-Software Optimizations Using a Unified Energy Estimation Framework.
IEEE Trans. Computers, 2003

Memory system optimization of embedded software.
Proc. IEEE, 2003

Managing Leakage Energy in Cache Hierarchies.
J. Instr. Level Parallelism, 2003

Leakage Current: Moore's Law Meets Static Power.
Computer, 2003

Reducing Disk Power Consumption in Servers with DRPM.
Computer, 2003

Loop Transformations for Reducing Data Space Requirements of Resource-Constrained Applications.
Proceedings of the Static Analysis, 10th International Symposium, 2003

Heap compression for memory-constrained Java environments.
Proceedings of the 2003 ACM SIGPLAN Conference on Object-Oriented Programming Systems, 2003

Adapting instruction level parallelism for optimizing leakage in VLIW architectures.
Proceedings of the 2003 Conference on Languages, 2003

Compiler-Based Code Partitioning for Intelligent Embedded Disk Processing.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Using Dynamic Branch Behavior for Power-Efficient Instruction Fetch.
Proceedings of the 2003 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2003), 2003

Interplay of energy and performance for disk arrays running transaction processing workloads.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

Energy optimization techniques in cluster interconnects.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Estimating influence of data layout optimizations on SDRAM energy consumption.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Exploiting program hotspots and code sequentiality for instruction cache leakage management.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

DRPM: Dynamic Speed Control for Power Mangagement in Server Class Disks.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Energy and Performance Considerations in Work Partitioning for Mobile Spatial Queries.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Energy-Aware Compilation and Execution in Java-Enabled Mobile Devices.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A compiler approach for reducing data cache energy.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Reducing dTLB Energy Through Dynamic Resizing.
Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Adapative Error Protection for Energy Efficiency.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

Array Composition and Decomposition for Optimizing Embedded Applications.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

An Energy-Oriented Evaluation of Communication Optimizations for Microcensor Networks.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

Energy-Conscious Memory Allocation and Deallocation for Pointer-Intensive Applications.
Proceedings of the Embedded Software, Third International Conference, 2003

ICR: In-Cache Replication for Enhancing Data Cache Reliability.
Proceedings of the 2003 International Conference on Dependable Systems and Networks (DSN 2003), 2003

CCC: Crossbar Connected Caches for Reducing Energy Consumption of On-Chip Multiprocessors.
Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

Compiler-Directed Management of Instruction Accesses.
Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

Compiler Support for Reducing Leakage Energy Consumption.
Proceedings of the 2003 Design, 2003

Masking the Energy Behavior of DES Encryption.
Proceedings of the 2003 Design, 2003

An Integrated Approach for Improving Cache Behavior.
Proceedings of the 2003 Design, 2003

Generalized Data Transformations for Enhancing Cache Behavior.
Proceedings of the 2003 Design, 2003

Runtime Code Parallelization for On-Chip Multiprocessors.
Proceedings of the 2003 Design, 2003

Implementation and Evaluation of an On-Demand Parameter-Passing Strategy for Reducing Energy.
Proceedings of the 2003 Design, 2003

Data Space Oriented Scheduling in Embedded Systems.
Proceedings of the 2003 Design, 2003

Interprocedural optimizations for improving data cache performance of array-intensive embedded applications.
Proceedings of the 40th Design Automation Conference, 2003

VL-CDRAM: variable line sized cached DRAMs.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Tracking object life cycle for leakage energy optimization.
Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Address Register Assignment for Reducing Code Size.
Proceedings of the Compiler Construction, 12th International Conference, 2003

Performance, energy, and reliability tradeoffs in replicating hot cache lines.
Proceedings of the International Conference on Compilers, 2003

Exploiting bank locality in multi-bank memories.
Proceedings of the International Conference on Compilers, 2003

Hardware/Software Techniques for Improving Cache Performance in Embedded Systems.
Proceedings of the Embedded Software for SoC, 2003

Energy-Aware Parameter Passing.
Proceedings of the Embedded Software for SoC, 2003

Data Space Oriented Scheduling.
Proceedings of the Embedded Software for SoC, 2003

Dynamic Parallelization of Array Based On-Chip Multiprocessor Applications.
Proceedings of the Embedded Software for SoC, 2003

Generalized Data Transformations.
Proceedings of the Embedded Software for SoC, 2003

Energy-performance trade-offs for spatial access methods on memory-resident data.
VLDB J., 2002

An Experimental Evaluation of I/O Optimizations on Different Applications.
IEEE Trans. Parallel Distributed Syst., 2002

An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets.
J. Supercomput., 2002

Tuning garbage collection for reducing memory system energy in an embedded java environment.
ACM Trans. Embed. Comput. Syst., 2002

Using Memory Compression for Energy Reduction in an Embedded Java System.
J. Circuits Syst. Comput., 2002

Address Code and Arithmetic Optimizations for Embedded Systems.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

A Heuristic for Clock Selection in High-Level Synthesis.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

Strategies for Improving Data Locality in Embedded Applications.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

Compiler-directed instruction cache leakage optimization.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Generating physical addresses directly for saving instruction TLB energy.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Power protocol: reducing power dissipation on off-chip data buses.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Energy-conscious compilation based on voltage scaling.
Proceedings of the 2002 Joint Conference on Languages, 2002

Compiler-directed cache polymorphism.
Proceedings of the 2002 Joint Conference on Languages, 2002

A Hybrid Strategy Based on Data Distribution and Migration for Optimizing Memory Locality.
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

Adaptive Garbage Collection for Battery-Operated Environments.
Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium, 2002

Hardware-Software Co-Adaptation for Data-Intensive Embedded Applications.
Proceedings of the 2002 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2002), 2002

Designing Energy-Efficient Software.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Compiler-Directed I/O Optimization.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Dynamic compilation for energy adaptation.
Proceedings of the 2002 IEEE/ACM International Conference on Computer-aided Design, 2002

Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Tuning Garbage Collection in an Embedded Java Environment.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Exploiting Inter-File Access Patterns Using Multi-Collective I/O.
Proceedings of the FAST '02 Conference on File and Storage Technologies, 2002

Data Space Oriented Tiling.
Proceedings of the Programming Languages and Systems, 2002

Enhancing Compiler Techniques for Memory Energy Optimizations.
Proceedings of the Embedded Software, Second International Conference, 2002

Reducing Cache Access Energy in Array-Intensive Application.
Proceedings of the 2002 Design, 2002

A Compiler-Based Approach for Improving Intra-Iteration Data Reuse.
Proceedings of the 2002 Design, 2002

EAC: A Compiler Framework for High-Level Energy Estimation and Optimization.
Proceedings of the 2002 Design, 2002

Power-Efficient Trace Caches.
Proceedings of the 2002 Design, 2002

Automatic data migration for reducing energy consumption in multi-bank memory systems.
Proceedings of the 39th Design Automation Conference, 2002

Exploiting shared scratch pad memory space in embedded multiprocessor systems.
Proceedings of the 39th Design Automation Conference, 2002

Compiler-directed scratch pad memory hierarchy design and management.
Proceedings of the 39th Design Automation Conference, 2002

An integer linear programming based approach for parallelizing applications in On-chip multiprocessors.
Proceedings of the 39th Design Automation Conference, 2002

An energy saving strategy based on adaptive loop parallelization.
Proceedings of the 39th Design Automation Conference, 2002

Scheduler-based DRAM energy management.
Proceedings of the 39th Design Automation Conference, 2002

Locality-conscious process scheduling in embedded systems.
Proceedings of the Tenth International Symposium on Hardware/Software Codesign, 2002

Energy savings through compression in embedded Java environments.
Proceedings of the Tenth International Symposium on Hardware/Software Codesign, 2002

Kernel-Level Caching for Optimizing I/O by Exploiting Inter-Application Data Sharing.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems.
Proceedings of the Compiler Construction, 11th International Conference, 2002

Optimizing inter-nest data locality.
Proceedings of the International Conference on Compilers, 2002

Leakage Energy Management in Cache Hierarchies.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

Compilation for Distributed Memory Architectures.
Proceedings of the Compiler Design Handbook: Optimizations and Machine Code Generation, 2002

Investigating Memory System Energy Behavior Using Software and Hardware Optimizations.
VLSI Design, 2001

Influence of compiler optimizations on system power.
IEEE Trans. Very Large Scale Integr. Syst., 2001

Static and Dynamic Locality Optimizations Using Integer Linear Programming.
IEEE Trans. Parallel Distributed Syst., 2001

Compiler-Directed Collective-I/O.
IEEE Trans. Parallel Distributed Syst., 2001

A Layout-Conscious Iteration Space Transformation Technique.
IEEE Trans. Computers, 2001

Data Relation Vectors: A New Abstraction for Data Optimizations.
IEEE Trans. Computers, 2001

Hardware and Software Techniques for Controlling DRAM Power Modes.
IEEE Trans. Computers, 2001

Design and Evaluation of a Smart Disk Cluster for DSS Commercial Workloads.
J. Parallel Distributed Comput., 2001

Efficient Synthesis of Array Intensive Computations onto FPGA Based Accelerators.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Formulation and Validation of an Energy Dissipation Model for the Clock Generation Circuitry and Distribution Networks.
Proceedings of the 14th International Conference on VLSI Design (VLSI Design 2001), 2001

Analyzing energy behavior of spatial access methods for memory-resident data.
Proceedings of the VLDB 2001, 2001

A dynamic locality optimization algorithm for linear algebra codes.
Proceedings of the 2001 ACM Symposium on Applied Computing (SAC), 2001

A compiler technique for improving whole-program locality.
Proceedings of the Conference Record of POPL 2001: The 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2001

Morphable Cache Architectures: Potential Benefits.
Proceedings of the 2001 ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems, 2001

vEC: virtual energy counters.
Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, 2001

Exploiting VLIW schedule slacks for dynamic and leakage energy reduction.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Improving Off-Chip Memory Energy Behavior in a Multi-processor, Multi-bank Environment.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

Energy Behavior of Java Applications from the Memory Perspective.
Proceedings of the 1st Java Virtual Machine Research and Technology Symposium, 2001

Exploiting scratch-pad memory using Presburger formulas.
Proceedings of the 14th International Symposium on Systems Synthesis, 2001

Power-aware partitioned cache architectures.
Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Compiler support for block buffering.
Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Influence of Array Allocation Mechanisms on Memory System Energy.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Use of Local Memory for Efficient Java Execution.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

A Framework for Energy Estimation of VLIW Architecture.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Improving Memory Energy Using Access Pattern Classification.
Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design, 2001

DRAM Energy Management Using Software and Hardware Directed Power Mode Control.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Reducing Memory Requirements of Nested Loops for Embedded Systems.
Proceedings of the 38th Design Automation Conference, 2001

Dynamic Management of Scratch-Pad Memory Space.
Proceedings of the 38th Design Automation Conference, 2001

Compiler-directed selection of dynamic memory layouts.
Proceedings of the Ninth International Symposium on Hardware/Software Codesign, 2001

Array Unification: A Locality Optimization Technique.
Proceedings of the Compiler Construction, 10th International Conference, 2001

Energy-efficient instruction cache using page-based placement.
Proceedings of the 2001 International Conference on Compilers, 2001

A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations.
IEEE Trans. Parallel Distributed Syst., 2000

Minimizing Data and Synchronization Costs in One-Way Communication.
IEEE Trans. Parallel Distributed Syst., 2000

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines.
J. Parallel Distributed Comput., 2000

Data management for large-scale scientific computations in high performance distributed systems.
Clust. Comput., 2000

A Holistic Approach to System Level Energy Optimization.
Proceedings of the Integrated Circuit Design, 2000

APRIL: A Run-Time Library for Tape-Resident Data.
Proceedings of the Eighth NASA Goddard Space Flight Center Conference on Mass Storage Systems and Technologies in cooperation with Seventeenth IEEE Symposium on Mass Storage Systems, 2000

Towards Energy-Aware Iteration Space Tiling.
Proceedings of the Languages, 2000

A Collective I/O Scheme Based on Compiler Analysis.
Proceedings of the Languages, 2000

Experimental Evaluation of Energy Behavior of Iteration Space Tiling.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Improving Offset Assignment for Embedded Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Memory system energy (poster session): influence of hardware-software optimizations.
Proceedings of the 2000 International Symposium on Low Power Electronics and Design, 2000

Energy-driven integrated hardware-software optimizations using SimplePower.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

A novel application development environment for large-scale scientific computations.
Proceedings of the 14th international conference on Supercomputing, 2000

Design and Evaluation of Smart Disk Architecture for DSS Commercial Workloads.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

Energy-Aware Instruction Scheduling.
Proceedings of the High Performance Computing, 2000

Improving Offset Assignment on Embedded Processors Using Transformations.
Proceedings of the High Performance Computing, 2000

Design and Evaluation of a Compiler-Directed Collective I/O Technique.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

The design and use of simplepower: a cycle-accurate energy estimation tool.
Proceedings of the 37th Conference on Design Automation, 2000

Energy-oriented compiler optimizations for partitioned memory architectures.
Proceedings of the 2000 International Conference on Compilers, 2000

A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts.
IEEE Trans. Parallel Distributed Syst., 1999

A global communication optimization technique based on data-flow analysis and linear algebra.
ACM Trans. Program. Lang. Syst., 1999

Improving Cache Locality by a Combination of Loop and Data Transformation.
IEEE Trans. Computers, 1999

A Matrix-Based Approach to Global Locality Optimization.
J. Parallel Distributed Comput., 1999

Improving Locality Using a Graph-Based Technique for Detecting Memory Layouts of Arrays.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

System Level Meta-data for High-Performance Data management.
Proceedings of the 3rd IEEE Metadata Conference 1999, MD 1999, Bethesda, 1999

A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

An integer linear programming approach for optimizing cache locality.
Proceedings of the 13th international conference on Supercomputing, 1999

A Framework for Interprocedural Locality Optimization Using Both Loop and Data Layout Transformations.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Compiler Optimizations for I/O-Intensive Computations.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems.
Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, 1999

Restructuring I/O-Intensive Computations for Locality.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

I/O-Conscious Tiling for Disk-Resident Data Sets.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

On Reducing False Sharing while Improving Locality on Shared Memory Multiprocessors.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

Compilation Techniques for Out-of-Core Parallel Computations.
Parallel Comput., 1998

Locality Optimization Algorithms for Compilation of Out-of-Core Codes.
J. Inf. Sci. Eng., 1998

An Experimental Study to Analyze and Optimize Hartree-Fock Application's I/O with Passion.
Int. J. High Perform. Comput. Appl., 1998

Improving Locality Using Loop and Data Transformations in an Integrated Framework.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Improving Locality in Out-of-Core Computations Using Data Layout Transformations.
Proceedings of the Languages, 1998

A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality.
Proceedings of the Languages and Compilers for Parallel Computing, 1998

A Generalized Framework for Global Communication Optimization.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests.
Proceedings of the 12th international conference on Supercomputing, 1998

Performance Implications of Architectural and Software Techniques on I/O-Intensive Applications.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Enhancing Spatial Locality via Data Layout Optimizations.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

A Matrix-Based Approach to the Global Locality Optimization Problem.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

Changing Interaction of Compiler and Architecture.
Computer, 1997

Optimization and Evaluation of Hartree-Fock Application's I/O with PASSION.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

I/O Optimizations for Compiling Out-of Core Programs on Distributed-Memory Machines.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

Data Access Reorganizations in Compiling Out-of-Core Data Parallel Programs on Distributed Memory Machines.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

A Unified Compiler Algorithm for Optimizing Locality, Parallelism and Communication in Out-of-core Computations.
Proceedings of the Fifth Workshop on I/O in Parallel and Distributed Systems, 1997

A Compiler Algorithm for Optimizing Locality in Loop Nests.
Proceedings of the 11th international conference on Supercomputing, 1997

Improving the Performance of Out-of-Core Computations.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

Global I/O optimizations for out-of-core computations.
Proceedings of the Fourth International on High-Performance Computing, 1997

Optimization of Out-of-Core Computations Using Chain Vectors.
Proceedings of the Euro-Par '97 Parallel Processing, 1997
