Wen-Mei W. Hwu
Orcid: 0000-0003-2532-5349Affiliations:
- University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, Urbana-Champaign, IL, USA
According to our database1,
Wen-Mei W. Hwu
authored at least 358 papers
between 1985 and 2024.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2002, "For technical contributions and leadership in computer architecture.".
IEEE Fellow
IEEE Fellow 1998, "For contributions to high performance compiler and microarchitecture technologies.".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on orcid.org
-
on id.loc.gov
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
Optim. Lett., December, 2024
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses.
Proc. VLDB Endow., February, 2024
CoRR, 2024
LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme.
CoRR, 2024
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level.
CoRR, 2024
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
Integr., November, 2023
PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks.
CoRR, 2023
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023
Proceedings of the 36th IEEE International System-on-Chip Conference, 2023
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023
RECO-LFSR: Reconfigurable Low-power Cryptographic processor based on LFSR for Trusted IoT platforms.
Proceedings of the 24th International Symposium on Quality Electronic Design, 2023
Proceedings of the 52nd International Conference on Parallel Processing, 2023
Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications, 2023
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023
2022
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture.
Dataset, October, 2022
MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging.
IEEE Trans. Parallel Distributed Syst., 2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
Inf. Sci., 2022
CoRR, 2022
BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage.
CoRR, 2022
Proceedings of the 35th IEEE International System-on-Chip Conference, 2022
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
2021
IEEE Trans. Parallel Distributed Syst., 2021
IEEE Trans. Computers, 2021
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture.
Proc. VLDB Endow., 2021
PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses.
CoRR, 2021
CoRR, 2021
PhraseScope: An Effective and Unsupervised Framework for Mining High Quality Phrases.
Proceedings of the 2021 SIAM International Conference on Data Mining, 2021
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes.
Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021
Extending HLS with High-Level Descriptive Language for Configurable Algorithm-Level Spatial Structure Design.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2021
Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021
2020
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM.
IEEE Trans. Computers, 2020
Proc. VLDB Endow., 2020
CoRR, 2020
CoRR, 2020
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs.
Proceedings of the ICPE '20: ACM/SPEC International Conference on Performance Engineering, 2020
Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes.
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the 2020 USENIX Conference on Operational Machine Learning, 2020
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems.
Proceedings of the Third Conference on Machine Learning and Systems, 2020
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Proceedings of the 27th IEEE International Conference on Electronics, Circuits and Systems, 2020
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Vertext: An End-to-end AI Powered Conversation Management System for Multi-party Chat Platforms.
Proceedings of the Companion Publication of the 2020 ACM Conference on Computer Supported Cooperative Work and Social Computing, 2020
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020
Proceedings of the 13th IEEE International Conference on Cloud Computing, 2020
2019
CoRR, 2019
A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications.
CoRR, 2019
CoRR, 2019
Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019
Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019
Proceedings of the 2019 IEEE World Congress on Services, 2019
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019
Proceedings of the ACM International Conference on Supercomputing, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving.
Proceedings of the International Conference on Computer-Aided Design, 2019
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019
FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019
PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function-as-a-Service.
Proceedings of the 12th IEEE International Conference on Cloud Computing, 2019
2018
J. Parallel Distributed Comput., 2018
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments.
CoRR, 2018
A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization.
CoRR, 2018
Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection.
CoRR, 2018
IEEE Comput. Archit. Lett., 2018
Proceedings of the High Performance Computing, 2018
Application-Transparent Near-Memory Processing Architecture with Memory Channel Network.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
A Fast and Massively-Parallel Inverse Solver for Multiple-Scattering Tomographic Image Reconstruction.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
Proceedings of the 2018 IEEE International Conference on Rebooting Computing, 2018
DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs.
Proceedings of the International Conference on Computer-Aided Design, 2018
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018
2017
Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era.
IEEE Micro, 2017
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017
Proceedings of the Accelerator Programming Using Directives - 4th International Workshop, 2017
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017
Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the IEEE International Conference on Rebooting Computing, 2017
Proceedings of the IEEE International Conference on Rebooting Computing, 2017
Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on Next-Generation Architectures.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
2016
FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs With the FCUDA Flow.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016
Common Bonds: MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor.
IEEE Micro, 2016
Platform choices and design demands for IoT platforms: cost, power, and performance tradeoffs.
IET Cyper-Phys. Syst.: Theory & Appl., 2016
Bioinform., 2016
Design of a power-efficient ARM processor with a timing-error detection and correction mechanism.
Proceedings of the 29th IEEE International System-on-Chip Conference, 2016
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing, 2016
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016
DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016
2015
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.
IEEE Trans. Parallel Distributed Syst., 2015
Sci. Program., 2015
Proceedings of the 8th IEEE/ACM International Conference on Utility and Cloud Computing, 2015
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 44th International Conference on Parallel Processing, 2015
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015
2014
BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads.
Bioinform., 2014
SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Proceedings of the 2nd International Workshop on Many-core Embedded Systems, 2014
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
Proceedings of the Numerical Computations with GPUs, 2014
2013
J. Supercomput., 2013
ACM Trans. Embed. Comput. Syst., 2013
More IMPATIENT: A gridding-accelerated Toeplitz-based strategy for non-Cartesian high-resolution 3D MRI on GPUs.
J. Parallel Distributed Comput., 2013
Int. J. Imaging Syst. Technol., 2013
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Proceedings of the 50th Annual Design Automation Conference 2013, 2013
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013
2012
Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01737-7, 2012
Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications.
Int. J. Parallel Program., 2012
Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems.
Computer, 2012
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012
Proceedings of the 12th IEEE International Conference on Data Mining, 2012
Design evaluation of OpenCL compiler framework for Coarse-Grained Reconfigurable Arrays.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012
2011
Comput. Sci. Eng., 2011
Proceedings of the Conference on Parallel Processing for Imaging Applications 2011, 2011
Impatient MRI: Illinois Massively Parallel Acceleration Toolkit for image reconstruction with enhanced throughput in MRI.
Proceedings of the 8th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the International Conference on Parallel Processing, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011
2010
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010
Proceedings of the Computer Architecture, 2010
Proceedings of the 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2010
Proceedings of the 47th Design Automation Conference, 2010
Proceedings of the CGO 2010, 2010
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
Raising the level of many-core programming with compiler technology: meeting a grand challenge.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures.
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010
Morgan Kaufmann, ISBN: 978-0-12-381472-2, 2010
2009
Microprocess. Microsystems, 2009
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009
Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, June 28, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd international conference on Supercomputing, 2009
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs.
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009
2008
J. Parallel Distributed Comput., 2008
J. Parallel Distributed Comput., 2008
Application Acceleration with the Explicitly Parallel Operations System - the EPOS Processor.
Proceedings of the IEEE Symposium on Application Specific Processors, 2008
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008
Proceedings of the Languages and Compilers for Parallel Computing, 2008
Proceedings of the Languages and Compilers for Parallel Computing, 2008
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008
Proceedings of the Fourth International Conference on e-Science, 2008
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008
Proceedings of the 5th Conference on Computing Frontiers, 2008
2007
Trans. High Perform. Embed. Archit. Compil., 2007
Proceedings of the Languages and Compilers for Parallel Computing, 2007
Corezilla: Build and Tame the Multicore Beast?
Proceedings of the 44th Design Automation Conference, 2007
Proceedings of the 44th Design Automation Conference, 2007
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
2006
IEEE Trans. Computers, 2006
2005
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005
2004
Proceedings of the Static Analysis, 11th International Symposium, 2004
Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, 2004
Proceedings of the Languages and Compilers for High Performance Computing, 2004
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004
2003
Energy saving and capacity improvement potential of power control in multi-hop wireless networks.
Comput. Networks, 2003
2002
Vacuum packing: extracting hardware-detected program phases for post-link optimization.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002
Proceedings of the International Conference on Compilers, 2002
2001
Proc. IEEE, 2001
Enhancing loop buffering of media and telecommunications applications using low-overhead predication.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001
A Study of the Energy Saving and Capacity Improvement Potential of Power Control in Multi-Hop Wireless Networks.
Proceedings of the 26th Annual IEEE Conference on Local Computer Networks (LCN 2001), 2001
Proceedings of the Proceedings IEEE INFOCOM 2001, 2001
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001
2000
Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000
Proceedings of the Proceedings 27th Conference on Local Computer Networks, 2000
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000
Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000
1999
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication.
Int. J. Parallel Program., 1999
Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1999
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999
Proceedings of the Languages and Compilers for Parallel Computing, 1999
A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999
An Architecture Framework for Introducing Predicated Execution into Embedded Microprocessors.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999
1998
IEEE Trans. Computers, 1998
Int. J. Parallel Program., 1998
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998
Retrospective: HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998
Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, 1998
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998
1997
Int. J. Parallel Program., 1997
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997
Proceedings of the 24th International Symposium on Computer Architecture, 1997
Architectural Support for Compiler-Synthesized Dynamic Branch Prediction Strategies: Rationale and Initial Results.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997
A study of the cache and branch performance issues with running Java on current hardware platforms.
Proceedings of the Proceedings IEEE COMPCON 97, 1997
1996
Guest Editors' Introduction.
Int. J. Parallel Program., 1996
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996
Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary Results.
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996
1995
IEEE Trans. Computers, 1995
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors.
IEEE Trans. Computers, 1995
IEEE Trans. Computers, 1995
Adv. Comput., 1995
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995
A study of the effects of compiler-controlled speculation on instruction and data caches.
Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995
1994
Softw. Pract. Exp., 1994
J. Parallel Distributed Comput., 1994
Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994
Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994
Proceedings of the 1994 International Conference on Parallel Processing, 1994
Proceedings of the ASPLOS-VI Proceedings, 1994
1993
ACM Trans. Comput. Syst., 1993
J. Supercomput., 1993
IEEE Trans. Computers, 1993
Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), 1993
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993
Register Connection: A New Approach to Adding Registers into Instruction Set Architectures.
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993
Proceedings of the Hardware and Software Architectures for Fault Tolerance, 1993
1992
IEEE Trans. Computers, 1992
Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, 1992
Proceedings of the Proceedings Supercomputing '92, 1992
Proceedings of the Third International Workshop on Rapid System Prototyping, 1992
Proceedings of the Languages and Compilers for Parallel Computing, 1992
Proceedings of the 6th international conference on Supercomputing, 1992
Tolerating First Level Memory Access Latency in High-Performance Systems.
Proceedings of the 1992 International Conference on Parallel Processing, 1992
Executing Nested Parallel Loops on Shared-Memory Multiprocessors.
Proceedings of the 1992 International Conference on Parallel Processing, 1992
Proceedings of the Digest of Papers: FTCS-22, 1992
Proceedings of the ASPLOS-V Proceedings, 1992
1991
Softw. Pract. Exp., 1991
SIGARCH Comput. Archit. News, 1991
Data Access Microarchitectures for Superscalar Processors with Compiler-Assisted Data Prefetching.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991
Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991
The Effect of Compiler Optimizations on Available Parallelism in Scalar Programs.
Proceedings of the International Conference on Parallel Processing, 1991
1990
SIGARCH Comput. Archit. News, 1990
A software based approach to achieving optimal performance for signature control flow checking.
Proceedings of the 20th International Symposium on Fault-Tolerant Computing, 1990
1989
A Simulation Study of Simultaneous Vector Prefetch Performance in Multiprocessor Memory Subsystems (Extended Abstract).
Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1989
Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation (PLDI), 1989
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors.
Proceedings of the 22nd Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1989
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989
Proceedings of the 3rd international conference on Supercomputing, 1989
1988
Proceedings of the 21st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1988, San Diego, California, USA, November 28, 1988
Exploiting Parallel Microprocessor Microarchitectures With a Compiler Code Generator.
Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988
1987
IEEE Trans. Computers, 1987
Proceedings of the 20st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1987
Proceedings of the 20st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1987
Proceedings of the 14th Annual International Symposium on Computer Architecture. Pittsburgh, 1987
1986
Proceedings of the 19th annual workshop on Microprogramming, 1986
HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality.
Proceedings of the 13th Annual Symposium on Computer Architecture, Tokyo, Japan, June 1986, 1986
Experiments with HPS, a Restricted Data Flow Microarchitecture for High Performance Computers.
Proceedings of the Spring COMPCON'86, 1986
1985
Proceedings of the 18th annual workshop on Microprogramming, 1985
Proceedings of the 18th annual workshop on Microprogramming, 1985