Wen-Mei W. Hwu

IEEE Trans. Computers, 2020

EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs.

[BibT_eX]

[DOI]

Seungwon Min

Proc. VLDB Endow., 2020

Fast CUDA-Aware MPI Datatypes without Platform Support.

[BibT_eX]

[DOI]

CoRR, 2020

Tearing Down the Memory Wall.

[BibT_eX]

[DOI]

Zaid Qureshi

CoRR, 2020

Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020.

[BibT_eX]

[DOI]

Carl Pearson

CoRR, 2020

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale.

[BibT_eX]

[DOI]

CoRR, 2020

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs.

[BibT_eX]

[DOI]

Proceedings of the ICPE '20: ACM/SPEC International Conference on Performance Engineering, 2020

Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes.

[BibT_eX]

[DOI]

Tekin Bicer

Proceedings of the International Conference for High Performance Computing, 2020

DLSpec: A Deep Learning Task Exchange Specification.

[BibT_eX]

[DOI]

Proceedings of the 2020 USENIX Conference on Operational Machine Learning, 2020

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems.

[BibT_eX]

[DOI]

Proceedings of the Third Conference on Machine Learning and Systems, 2020

FReaC Cache: Folded-logic Reconfigurable Computing in the Last Level Cache.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Node-Aware Stencil Communication for Heterogeneous Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Advancing Computing Infrastructure for Very Large-Scale Deep Learning at C3SR.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Micro - GAGE: A Low-power Compact GAGE Hash Function Processor for IoT Applications.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Conference on Electronics, Circuits and Systems, 2020

DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation.

[BibT_eX]

[DOI]

Carl Pearson

Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices.

[BibT_eX]

[DOI]

Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

Exploring Semantic Capacity of Terms.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation.

[BibT_eX]

[DOI]

Zhonghao Wang

Yunchao Wei

Rogério Schmidt Feris

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Vertext: An End-to-end AI Powered Conversation Management System for Multi-party Chat Platforms.

[BibT_eX]

[DOI]

Omer Anjum

Chak Ho Chan

Tanitpong Lawphongpanich

Proceedings of the Companion Publication of the 2020 ACM Conference on Computer Supported Cooperative Work and Social Computing, 2020

The design and implementation of the wolfram language compiler.

[BibT_eX]

[DOI]

Tom Wickham-Jones

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

The Design and Implementation of a Scalable Deep Learning Benchmarking Platform.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on Cloud Computing, 2020

2019

The Design and Implementation of a Scalable DL Benchmarking Platform.

[BibT_eX]

[DOI]

CoRR, 2019

Across-Stack Profiling and Characterization of Machine Learning Models on GPUs.

[BibT_eX]

[DOI]

CoRR, 2019

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection.

[BibT_eX]

[DOI]

CoRR, 2019

A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications.

[BibT_eX]

[DOI]

Omer Anjum

Jinjun Xiong

CoRR, 2019

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices.

[BibT_eX]

[DOI]

CoRR, 2019

Challenges and Pitfalls of Reproducing Machine Learning Artifacts.

[BibT_eX]

[DOI]

CoRR, 2019

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures.

[BibT_eX]

[DOI]

Sitao Huang

Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

MLModelScope: Evaluate and Introspect Cognitive Pipelines.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE World Congress on Services, 2019

MemXCT: memory-centric X-ray CT reconstruction with massive parallelization.

[BibT_eX]

[DOI]

Tekin Biçer

Proceedings of the International Conference for High Performance Computing, 2019

Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

DeepStore: In-Storage Acceleration for Intelligent Queries.

[BibT_eX]

[DOI]

Zaid Qureshi

Weixin Liang

Ziyan Feng

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

Accelerating reduction and scan using tensor core units.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

SPGNet: Semantic Prediction Guidance for Scene Parsing.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

Update on Triangle Counting on GPU.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

Accelerating Sparse Deep Neural Networks on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

Update on k-truss Decomposition on GPU.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

An Efficient GPU Implementation Technique for Higher-Order 3D Stencils.

[BibT_eX]

[DOI]

Omer Anjum

Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference.

[BibT_eX]

[DOI]

Aayush Ankit

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy.

[BibT_eX]

[DOI]

Ahmed H. M. O. Abulila

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS.

[BibT_eX]

[DOI]

Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function-as-a-Service.

[BibT_eX]

[DOI]

Cheng Li

Jinjun Xiong

Proceedings of the 12th IEEE International Conference on Cloud Computing, 2019

2018

Iterative Modulo Scheduling.

[BibT_eX]

[DOI]

IEEE Micro, 2018

Accelerator Architectures A Ten-Year Retrospective.

[BibT_eX]

[DOI]

Sanjay J. Patel

IEEE Micro, 2018

High-throughput Ant Colony Optimization on graphics processing units.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2018

MLModelScope: Evaluate and Measure ML Models within AI Pipelines.

[BibT_eX]

[DOI]

CoRR, 2018

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments.

[BibT_eX]

[DOI]

Cheng Li

Jinjun Xiong

CoRR, 2018

A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization.

[BibT_eX]

[DOI]

CoRR, 2018

Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection.

[BibT_eX]

[DOI]

Bowen Cheng

Yunchao Wei

Rogério Schmidt Feris

CoRR, 2018

SCOPE: C3SR Systems Characterization and Benchmarking Framework.

[BibT_eX]

[DOI]

CoRR, 2018

Semi-Coherent DMA: An Alternative I/O Coherency Management for Embedded Systems.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2018

NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2018

Application-Transparent Near-Memory Processing Architecture with Memory Channel Network.

[BibT_eX]

[DOI]

Mohammad Alian

Seungwon Min

Hadi Asgharimoghaddam

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A Fast and Massively-Parallel Inverse Solver for Multiple-Scattering Tomographic Image Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Hardware-Software Co-Design for an Analog-Digital Accelerator for Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Rebooting Computing, 2018

DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2018

Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Triangle Counting and Truss Decomposition using FPGA.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

AccDNN: An IP-Based DNN Generator for FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017

SAVI objects: sharing and virtuality incorporated.

[BibT_eX]

[DOI]

Proc. ACM Program. Lang., 2017

Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era.

[BibT_eX]

[DOI]

IEEE Micro, 2017

Collaborative Computing for Heterogeneous Integrated Systems.

[BibT_eX]

[DOI]

Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017

Enabling GPU Support for the COMPSs-Mobile Framework.

[BibT_eX]

[DOI]

Francesc Lordan

Rosa M. Badia

Proceedings of the Accelerator Programming Using Directives - 4th International Workshop, 2017

Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Chai: Collaborative heterogeneous applications for integrated-architectures.

[BibT_eX]

[DOI]

Thomas B. Jablin

Antonio J. Peña

Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Keynote: Architecture and software for emerging low-power systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

RAI: A Scalable Project Submission System for Parallel Programming Courses.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Rebooting the Data Access Hierarchy of Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Rebooting Computing, 2017

Generalize or Die: Operating Systems Support for Memristor-Based Accelerators.

[BibT_eX]

[DOI]

Pedro Bruel

Proceedings of the IEEE International Conference on Rebooting Computing, 2017

Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on Next-Generation Architectures.

[BibT_eX]

[DOI]

Simon D. Hammond

Christian R. Trott

Gowthami Jayashri Manikandan

Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling.

[BibT_eX]

[DOI]

Sitao Huang

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

2016

In-Place Matrix Transposition on GPUs.

[BibT_eX]

[DOI]

I-Jui Sung

José María González-Linares

Nicolás Guil

IEEE Trans. Parallel Distributed Syst., 2016

FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

Common Bonds: MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Platform choices and design demands for IoT platforms: cost, power, and performance tradeoffs.

[BibT_eX]

[DOI]

IET Cyper-Phys. Syst.: Theory & Appl., 2016

BLESS 2: accurate, memory-efficient and fast error correction method.

[BibT_eX]

[DOI]

Bioinform., 2016

Design of a power-efficient ARM processor with a timing-error detection and correction mechanism.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International System-on-Chip Conference, 2016

A programming system for future proofing performance critical libraries.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Efficient kernel synthesis for performance portable programming.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

AsHES 2016 Keynote.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

WebGPU: A Scalable Online Development Platform for GPU Programming Courses.

[BibT_eX]

[DOI]

Carl Pearson

Gowthami Jayashri Manikandan

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Efficient and Scalable Workflows for Genomic Analyses.

[BibT_eX]

[DOI]

Zbigniew T. Kalbarczyk

Ravishankar K. Iyer

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing, 2016

Acceleration of the Pair-HMM Algorithm for DNA Variant Calling.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

SpaceJMP: Programming with Multiple Virtual Address Spaces.

[BibT_eX]

[DOI]

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model.

[BibT_eX]

[DOI]

Hee-Seok Kim

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

Optimized Data Transfers Based on the OpenCL Event Management Mechanism.

[BibT_eX]

[DOI]

Sci. Program., 2015

Enhancing the Usability and Utilization of Accelerated Architectures via Docker.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE/ACM International Conference on Utility and Cloud Computing, 2015

GPU-SM: shared memory multi-GPU programming.

[BibT_eX]

[DOI]

Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

In-Place Data Sliding Algorithms for Many-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the 44th International Conference on Parallel Processing, 2015

FPGA accelerated DNA error correction.

[BibT_eX]

[DOI]

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

2014

What is ahead for parallel computing.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2014

BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads.

[BibT_eX]

[DOI]

Bioinform., 2014

SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

In-place transposition of rectangular matrices on accelerators.

[BibT_eX]

[DOI]

I-Jui Sung

José María González-Linares

Nicolás Guil

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing.

[BibT_eX]

[DOI]

Thomas B. Jablin

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Adaptive Cache Management for Energy-Efficient GPU Computing.

[BibT_eX]

[DOI]

Xuhao Chen

Jie Lv

Zhiying Wang

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Adaptive Cache Bypass and Insertion for Many-core Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Many-core Embedded Systems, 2014

Automatic execution of single-GPU computations across multiple GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

A Guide for Implementing Tridiagonal Solvers on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Numerical Computations with GPUs, 2014

2013

Scalable SIMD-parallel memory allocation for many-core machines.

[BibT_eX]

[DOI]

Xiaohuang Huang

Stephen Jones

Ian Buck

J. Supercomput., 2013

Efficient compilation of CUDA kernels for high-performance computing on FPGAs.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2013

More IMPATIENT: A gridding-accelerated Toeplitz-based strategy for non-Cartesian high-resolution 3D MRI on GPUs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

Rapid computation of sodium bioscales using gpu-accelerated image reconstruction.

[BibT_eX]

[DOI]

Int. J. Imaging Syst. Technol., 2013

Rethinking computer architecture for throughput computing.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

clMPI: An OpenCL Extension for Interoperation with the Message Passing Interface.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Throughput-oriented kernel porting onto FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Comparison based sorting for systems with multiple GPUs.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2012

Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01737-7, 2012

Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2012

Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems.

[BibT_eX]

[DOI]

Computer, 2012

TIGER: tiled iterative genome assembler.

[BibT_eX]

[DOI]

BMC Bioinform., 2012

A scalable, numerically stable, high-performance tridiagonal solver using GPUs.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors.

[BibT_eX]

[DOI]

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Efficient Pattern-Based Time Series Classification on GPU.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on Data Mining, 2012

Design evaluation of OpenCL compiler framework for Coarse-Grained Reconfigurable Arrays.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

2011

Superscalar Processors.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

EcoG: A Power-Efficient GPU Cluster Architecture for Scientific Computing.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2011

Advanced MRI reconstruction toolbox with accelerating on GPU.

[BibT_eX]

[DOI]

Proceedings of the Conference on Parallel Processing for Imaging Applications 2011, 2011

Impatient MRI: Illinois Massively Parallel Acceleration Toolkit for image reconstruction with enhanced throughput in MRI.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2011

Panel Statement.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

A Scalable Tridiagonal Solver for GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Parallel implementation of Multi-dimensional Ensemble Empirical Mode Decomposition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Multilevel Granularity Parallelism Synthesis on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

2010

High-Performance Computing with Accelerators.

[BibT_eX]

[DOI]

Volodymyr V. Kindratenko

Comput. Sci. Eng., 2010

An adaptive performance modeling tool for GPU architectures.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Implementing a GPU Programming Model on a Non-GPU Accelerator Architecture.

[BibT_eX]

[DOI]

Proceedings of the Computer Architecture, 2010

Accelerating iterative field-compensated MR image reconstruction on GPUS.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2010

An effective GPU implementation of breadth-first search.

[BibT_eX]

[DOI]

Lijuan Luo

Martin D. F. Wong

Proceedings of the 47th Design Automation Conference, 2010

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs.

[BibT_eX]

[DOI]

Proceedings of the CGO 2010, 2010

An asymmetric distributed shared memory model for heterogeneous parallel systems.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Data layout transformation exploiting memory-level parallelism in structured grid many-core applications.

[BibT_eX]

[DOI]

I-Jui Sung

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Raising the level of many-core programming with compiler technology: meeting a grand challenge.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures.

[BibT_eX]

[DOI]

Xiaolong Wu

Nady Obeid

Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines.

[BibT_eX]

[DOI]

Xiaohuang Huang

Stephen Jones

Ian Buck

Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

Programming Massively Parallel Processors - A Hands-on Approach.

[BibT_eX]

[DOI]

David Blair Kirk

Morgan Kaufmann, ISBN: 978-0-12-381472-2, 2010

2009

The parallelization of video processing.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2009

Hardware-compiler co-design for adjustable data power savings.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2009

Compute Unified Device Architecture Application Suitability.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2009

FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

Accelerating MR Image Reconstruction on GPUS.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, June 28, 2009

Long time-scale simulations of in vivo diffusion using GPU hardware.

[BibT_eX]

[DOI]

Zaida Luthey-Schulten

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Many-core parallel computing - Can compilers and tools do the heavy lifting?

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

High-performance CUDA kernel execution on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 23rd international conference on Supercomputing, 2009

GPU clusters for high-performance computing.

[BibT_eX]

[DOI]

Volodymyr V. Kindratenko

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs.

[BibT_eX]

[DOI]

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009

Optimization of tele-immersion codes.

[BibT_eX]

[DOI]

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009

2008

Guest Editors' Introduction: Accelerator Architectures.

[BibT_eX]

[DOI]

Sanjay J. Patel

IEEE Micro, 2008

Accelerating advanced MRI reconstructions on GPUs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2008

Program optimization carving for GPU computing.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2008

Thousand-Core Chips [Roundtable].

[BibT_eX]

[DOI]

IEEE Des. Test Comput., 2008

The Concurrency Challenge.

[BibT_eX]

[DOI]

Kurt Keutzer

Timothy G. Mattson

IEEE Des. Test Comput., 2008

Application Acceleration with the Explicitly Parallel Operations System - the EPOS Processor.

[BibT_eX]

[DOI]

Deming Chen

Proceedings of the IEEE Symposium on Application Specific Processors, 2008

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

CUDA-Lite: Reducing GPU Programming Complexity.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2008

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs.

[BibT_eX]

[DOI]

Sam S. Stone

Proceedings of the Languages and Compilers for Parallel Computing, 2008

CUBA: an architecture for efficient CPU/co-processor data communication.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Visualization and Analysis of GPU Summer School Applicants and Participants.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Conference on e-Science, 2008

Program optimization space pruning for a multithreaded gpu.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

GPU acceleration of cutoff pair potentials for molecular modeling applications.

[BibT_eX]

[DOI]

Proceedings of the 5th Conference on Computing Frontiers, 2008

2007

Automatic Discovery of Coarse-Grained Parallelism in Media Applications.

[BibT_eX]

[DOI]

Sain-Zee Ueng

Robert E. Kidd

Matthew I. Frank

Trans. High Perform. Embed. Archit. Compil., 2007

Toward Application-Aware Security and Reliability.

[BibT_eX]

[DOI]

IEEE Secur. Priv., 2007

Iteration Disambiguation for Parallelism Identification in Time-Sliced Applications.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2007

Corezilla: Build and Tame the Multicore Beast?

[BibT_eX]

Proceedings of the 44th Design Automation Conference, 2007

Implicitly Parallel Programming Models for Thousand-Core Microprocessors.

[BibT_eX]

[DOI]

Proceedings of the 44th Design Automation Conference, 2007

CIGAR: Application Partitioning for a CPU/Coprocessor Architecture.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2006

Tolerating Cache-Miss Latency with Multipass Pipelines.

[BibT_eX]

[DOI]

Ronald D. Barnes

IEEE Micro, 2006

2005

Guest Editors' Introduction.

[BibT_eX]

[DOI]

Krishna V. Palem

IEEE Trans. Computers, 2005

"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense.

[BibT_eX]

[DOI]

Ronald D. Barnes

Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

The Future of Computer Architecture Research: An Industrial Perspective.

[BibT_eX]

[DOI]

Sanjay J. Patel

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004

Bottom-Up and Top-Down Context-Sensitive Summary-Based Pointer Analysis.

[BibT_eX]

[DOI]

Erik M. Nystrom

Hong-Seok Kim

Proceedings of the Static Analysis, 11th International Symposium, 2004

Importance of heap specialization in pointer analysis.

[BibT_eX]

[DOI]

Erik M. Nystrom

Hong-Seok Kim

Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, 2004

Trimaran: An Infrastructure for Research in Instruction-Level Parallelism.

[BibT_eX]

[DOI]

Lakshmi N. Chakrapani

Proceedings of the Languages and Compilers for High Performance Computing, 2004

Field-testing IMPACT EPIC research results in Itanium 2.

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

2003

Energy saving and capacity improvement potential of power control in multi-hop wireless networks.

[BibT_eX]

[DOI]

Comput. Networks, 2003

2002

Vacuum packing: extracting hardware-detected program phases for post-link optimization.

[BibT_eX]

[DOI]

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Code coverage and input variability: effects on architecture and compiler research.

[BibT_eX]

[DOI]

Hillery C. Hunter

Proceedings of the International Conference on Compilers, 2002

2001

An Architectural Framework for Runtime Optimization.

[BibT_eX]

[DOI]

Christopher N. George

IEEE Trans. Computers, 2001

Program decision logic optimization using predication and control speculation.

[BibT_eX]

[DOI]

John W. Sias

Proc. IEEE, 2001

Enhancing loop buffering of media and telecommunications applications using low-overhead predication.

[BibT_eX]

[DOI]

John W. Sias

Hillery C. Hunter

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Modulo schedule buffers.

[BibT_eX]

[DOI]

Matthew C. Merten

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

A Study of the Energy Saving and Capacity Improvement Potential of Power Control in Multi-Hop Wireless Networks.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual IEEE Conference on Local Computer Networks (LCN 2001), 2001

A Power Controlled Multiple Access Protocol for Wireless Packet Networks.

[BibT_eX]

[DOI]

Jeffrey P. Monks

Vaduvur Bharghavan

Proceedings of the Proceedings IEEE INFOCOM 2001, 2001

Code Reordering and Speculation Support for Dynamic Optimization System.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

Accurate and efficient predicate analysis with binary decision diagrams.

[BibT_eX]

[DOI]

John W. Sias

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Transmission Power Control for Multiple Access Wireless Packet Networks.

[BibT_eX]

[DOI]

Jeffrey P. Monks

Vaduvur Bharghavan

Proceedings of the Proceedings 27th Conference on Local Computer Networks, 2000

A hardware mechanism for dynamic extraction and relayout of program hot spots.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Hardware Support for Dynamic Management of Compiler-Directed Computation Reuse.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000

1999

Architecture.

[BibT_eX]

[DOI]

Proceedings of the VLSI Handbook., 1999

Run-Time Cache Bypassing.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

Editors' Introduction.

[BibT_eX]

[DOI]

Mark Smotherman

Int. J. Parallel Program., 1999

Editor's Introduction.

[BibT_eX]

[DOI]

Mark Smotherman

Int. J. Parallel Program., 1999

The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1999

A New Framework for Debugging Globally Optimized Code.

[BibT_eX]

[DOI]

Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1999

Compiler-Directed Dynamic Computation Reuse: Rationale and Initial Results.

[BibT_eX]

[DOI]

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

An Empirical Study of Function Pointers Using SPEC Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1999

A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization.

[BibT_eX]

[DOI]

Matthew C. Merten

Andrew R. Trick

Christopher N. George

Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

The Program Decision Logic Approach to Predicated Execution.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

An Architecture Framework for Introducing Predicated Execution into Embedded Microprocessors.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

1998

Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation.

[BibT_eX]

[DOI]

Mary Ann Hirsch

IEEE Trans. Computers, 1998

Optimization of Machine Descriptions for Efficient Use.

[BibT_eX]

[DOI]

B. Ramakrishna Rau

Int. J. Parallel Program., 1998

Foreword to the Special Issue.

[BibT_eX]

[DOI]

Steve Beaty

Int. J. Parallel Program., 1998

Introduction to Predicate Execution.

[BibT_eX]

[DOI]

Computer, 1998

Compiler-Directed Early Load-Address Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Retrospective: HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality.

[BibT_eX]

[DOI]

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Retrospective: IMPACT: An Architectural Framework for Multiple-Instruction Issue.

[BibT_eX]

[DOI]

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors.

[BibT_eX]

[DOI]

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998

Run-Time Adaptive Cache Management.

[BibT_eX]

[DOI]

Teresa L. Johnson

Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, 1998

Improving Static Branch Prediction in a Compiler.

[BibT_eX]

[DOI]

Brian L. Deitrich

Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997

Region-based compilation: Introduction, motivation, and initial experience.

[BibT_eX]

[DOI]

Richard E. Hank

B. Ramakrishna Rau

Int. J. Parallel Program., 1997

Optimizing NET Compilers for Improved Java Performance.

[BibT_eX]

[DOI]

Computer, 1997

Run-Time Spatial Locality Detection and Optimization.

[BibT_eX]

[DOI]

Teresa L. Johnson

Matthew C. Merten

Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

A Framework for Balancing Control Flow and Predication.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

Run-Time Adaptive Cache Hierarchy Management via Reference Analysis.

[BibT_eX]

[DOI]

Teresa L. Johnson

Proceedings of the 24th International Symposium on Computer Architecture, 1997

Architectural Support for Compiler-Synthesized Dynamic Branch Prediction Strategies: Rationale and Initial Results.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

A study of the cache and branch performance issues with running Java on current hardware platforms.

[BibT_eX]

[DOI]

Proceedings of the Proceedings IEEE COMPCON 97, 1997

1996

Guest Editors' Introduction.

[BibT_eX]

Matthew K. Farrens

Int. J. Parallel Program., 1996

Modulo Scheduling of Loops in Control-intensive Non-numeric Programs.

[BibT_eX]

[DOI]

Daniel M. Lavery

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary Results.

[BibT_eX]

[DOI]

Cheng-Hsueh A. Hsieh

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Speculative Hedge: Regulating Compile-time Speculation Against Profile Variations.

[BibT_eX]

[DOI]

Brian L. Deitrich

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

1995

Compiler-Based Multiple Instruction Retry.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1995

Three Architecutral Models for Compiler-Controlled Speculative Execution.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1995

The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1995

Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1995

Compiler technology for future microprocessors.

[BibT_eX]

[DOI]

Proc. IEEE, 1995

Advances in Benchmarking Techniques: New Standards and Quantitative Metrics.

[BibT_eX]

[DOI]

Adv. Comput., 1995

Unrolling-based optimizations for modulo scheduling.

[BibT_eX]

[DOI]

Daniel M. Lavery

Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Region-based compilation: an introduction and motivation.

[BibT_eX]

[DOI]

Richard E. Hank

B. Ramakrishna Rau

Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

A Comparison of Full and Partial Predicated Execution Support for ILP Processors.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

A study of the effects of compiler-controlled speculation on instruction and data caches.

[BibT_eX]

[DOI]

Roger A. Bringmann

Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995

1994

The Susceptibility of Programs to Context Switching.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1994

Incremental Compiler Transformations for Multiple Instruction Retry.

[BibT_eX]

[DOI]

Softw. Pract. Exp., 1994

Performance Implications of Synchronization Support for Parallel Fortran Programs.

[BibT_eX]

[DOI]

Sadun Anik

J. Parallel Distributed Comput., 1994

From the guest editors.

[BibT_eX]

[DOI]

Alex Nicolau

Int. J. Parallel Program., 1994

Profile-assisted instruction scheduling.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1994

Data relocation and prefetching for programs with large data sets.

[BibT_eX]

[DOI]

Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994

Characterizing the impact of predicated execution on branch prediction.

[BibT_eX]

[DOI]

Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994

An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures.

[BibT_eX]

[DOI]

Shyh-Kwei Chen

W. Kent Fuchs

Proceedings of the 1994 International Conference on Parallel Processing, 1994

Dynamic Memory Disambiguation Using the Memory Conflict Buffer.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-VI Proceedings, 1994

1993

Sentinel Scheduling for VLIW and Superscalar Processors.

[BibT_eX]

[DOI]

Michael S. Schlansker

ACM Trans. Comput. Syst., 1993

The superblock: An effective technique for VLIW and superscalar compilation.

[BibT_eX]

[DOI]

J. Supercomput., 1993

The Effect of Code Expanding Optimizations on Instruction Cache Design.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1993

An execution Profiler for Window-oriented Applications.

[BibT_eX]

[DOI]

Aloke Gupta

Softw. Pract. Exp., 1993

Reverse If-Conversion.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), 1993

Superblock formation using static program analysis.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

Speculative execution exception recovery using write-back suppression.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

Application of Compiler-Assisted Rollback Recovery to Speculative Execution Repair.

[BibT_eX]

[DOI]

W. Kent Fuchs

Neal J. Alewine

Proceedings of the Hardware and Software Architectures for Fault Tolerance, 1993

1992

Efficient Instruction Sequencing with Inline Target Insertion.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1992

Profile-guided Automatic Inline Expansion for C Programs.

[BibT_eX]

[DOI]

Softw. Pract. Exp., 1992

Xprof: Profiling the Execution of X Window Programs.

[BibT_eX]

[DOI]

Aloke Gupta

Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, 1992

Compiler Code Transformations for Superscalar-Based High Performance Systems.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '92, 1992

Systematic prototyping of superscalar computer architectures.

[BibT_eX]

[DOI]

Proceedings of the Third International Workshop on Rapid System Prototyping, 1992

Using Profile Information to Assist Advaced Compiler Optimization and Scheduling.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1992

Tolerating data access latency with register preloading.

[BibT_eX]

[DOI]

Proceedings of the 6th international conference on Supercomputing, 1992

Tolerating First Level Memory Access Latency in High-Performance Systems.

[BibT_eX]

William Y. Chen

Proceedings of the 1992 International Conference on Parallel Processing, 1992

Executing Nested Parallel Loops on Shared-Memory Multiprocessors.

[BibT_eX]

Sadun Anik

Proceedings of the 1992 International Conference on Parallel Processing, 1992

Branch Recovery with Compiler-Assisted Multiple Instruction Retry.

[BibT_eX]

[DOI]

Proceedings of the Digest of Papers: FTCS-22, 1992

Sentinel Scheduling for VLIW and Superscalar Processors.

[BibT_eX]

[DOI]

Michael S. Schlansker

Proceedings of the ASPLOS-V Proceedings, 1992

1991

Using Profile Information to Assist Classic Code Optimizations.

[BibT_eX]

[DOI]

Softw. Pract. Exp., 1991

A brief survey of benchmark usage in the architecture community.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 1991

Benchmark Characterization.

[BibT_eX]

[DOI]

Computer, 1991

Data Access Microarchitectures for Superscalar Processors with Compiler-Assisted Data Prefetching.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

The Effect of Compiler Optimizations on Available Parallelism in Scalar Programs.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1991

1990

Snoopy cache test-and-test-and-set without execessive bus contention.

[BibT_eX]

[DOI]

Andy Glew

SIGARCH Comput. Archit. News, 1990

A software based approach to achieving optimal performance for signature control flow checking.

[BibT_eX]

[DOI]

Nancy J. Warter

Proceedings of the 20th International Symposium on Fault-Tolerant Computing, 1990

1989

A Simulation Study of Simultaneous Vector Prefetch Performance in Multiprocessor Memory Subsystems (Extended Abstract).

[BibT_eX]

Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1989

Inline Function Expansion for Compiling C Programs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation (PLDI), 1989

Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors.

[BibT_eX]

[DOI]

P.-H. Chang

Proceedings of the 22nd Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1989

Comparing Software and Hardware Schemes For Reducing the Cost of Branches.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

Achieving High Instruction Cache Performance with an Optimizing Compiler.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

Control flow optimization for supercomputer scalar processing.

[BibT_eX]

[DOI]

Proceedings of the 3rd international conference on Supercomputing, 1989

1988

Trace selection for compiling large C application programs to microcode.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1988, San Diego, California, USA, November 28, 1988

Exploiting Parallel Microprocessor Microarchitectures With a Compiler Code Generator.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988

1987

Checkpoint Repair for High-Performance Out-of-Order Execution Machines.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1987

On tuning the microarchitecture of an HPS implementation of the VAX.

[BibT_eX]

[DOI]

Proceedings of the 20st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1987

Exploiting horizontal and vertical concurrency via the HPSm microprocessor.

[BibT_eX]

[DOI]

Proceedings of the 20st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1987

Checkpoint Repair for Out-of-order Execution Machines.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual International Symposium on Computer Architecture. Pittsburgh, 1987

1986

Run-time generation of HPS microinstructions from a VAX instruction stream.

[BibT_eX]

[DOI]

Proceedings of the 19th annual workshop on Microprogramming, 1986

HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Symposium on Computer Architecture, Tokyo, Japan, June 1986, 1986

Experiments with HPS, a Restricted Data Flow Microarchitecture for High Performance Computers.

[BibT_eX]

Proceedings of the Spring COMPCON'86, 1986

1985

Critical issues regarding HPS, a high performance microarchitecture.

[BibT_eX]

[DOI]

Proceedings of the 18th annual workshop on Microprogramming, 1985

HPS, a new microarchitecture: rationale and introduction.

[BibT_eX]

[DOI]