Guang R. Gao
Orcid: 0000-0002-5265-7528Affiliations:
- University of Delaware, Newark, USA
According to our database1,
Guang R. Gao
authored at least 301 papers
between 1983 and 2023.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2007, "For contributions to multiprocessor computers and compiler optimization techniques.".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on id.loc.gov
-
on d-nb.info
-
on dl.acm.org
On csauthors.net:
Bibliography
2023
Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023
Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores, 2023
Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model.
Proceedings of the 52nd International Conference on Parallel Processing Workshops, 2023
2022
Extending an asynchronous runtime system for high throughput applications: A case study.
J. Parallel Distributed Comput., 2022
A Profile-Based AI-Assisted Dynamic Scheduling Approach for Heterogeneous Architectures.
Int. J. Parallel Program., 2022
Proceedings of the ExHET@PPoPP 2022: Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions, 2022
2021
swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer.
Inf. Sci., 2021
Guest Editorial: Special issue on Network and Parallel Computing for Emerging Architectures and Applications.
Int. J. Parallel Program., 2021
The Promise of Dataflow Architectures in the Design of Processing Systems for Autonomous Machines.
CoRR, 2021
Proceedings of the International Conference for High Performance Computing, 2021
2020
Proceedings of the Fourth IEEE/ACM Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2020
Proceedings of the Fourth IEEE/ACM Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2020
PDAWL: Profile-Based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2020
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020
2019
CCF Trans. High Perform. Comput., 2019
Sequential Codelet Model of Program Execution. A Super-Codelet model based on the Hierarchical Turing Machine.
Proceedings of the IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2019
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019
Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design.
Proceedings of the 43rd IEEE Annual Computer Software and Applications Conference, 2019
2018
Int. J. Parallel Program., 2018
2017
ACM Trans. Archit. Code Optim., 2017
Int. J. High Perform. Comput. Appl., 2017
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017
Multigrain Parallelism: Bridging Coarse-Grain Parallel Programs and Fine-Grain Event-Driven Multithreading.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Leveraging access port positions to accelerate page table walk in DWM-based main memory.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017
Proceedings of the 54th Annual Design Automation Conference, 2017
Proceedings of the Computing Frontiers Conference, 2017
2016
The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining.
Int. J. Parallel Program., 2016
Proceedings of the Network and Parallel Computing, 2016
Proceedings of the Languages and Compilers for Parallel Computing, 2016
Proceedings of the Languages and Compilers for Parallel Computing, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
Application characterization at scale: lessons learned from developing a distributed open community runtime system for high performance computing.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016
2015
Author Rebuttal to Rocha et al. "Comments on Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks".
J. Signal Process. Syst., 2015
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015
Landing Containment Domains on SWARM: Toward a Robust Resiliency Solution on a Dynamic Adaptive Runtime Machine.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015
Proceedings of the International Conference on Computational Science, 2015
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015
Energy efficient multi-level tiling for dense matrix multiplication on many-core architecture.
Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015
2014
Microprocess. Microsystems, 2014
Proceedings of the Languages and Compilers for Parallel Computing, 2014
Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
ACDT: Architected Composite Data Types trading-in unfettered data access for improved execution.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014
Proceedings of the International Conference on Computational Science, 2014
2013
J. Parallel Distributed Comput., 2013
Proceedings of the 12th IEEE International Conference on Trust, 2013
Proceedings of the Languages and Compilers for Parallel Computing, 2013
Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution Models.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
A dynamic schema to increase performance in many-core architectures through percolation operations.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013
Proceedings of the Euro-Par 2013 Parallel Processing, 2013
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013
Proceedings of the Computing Frontiers Conference, 2013
2012
Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2012
ACM Trans. Archit. Code Optim., 2012
Proceedings of the 2012 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, 2012
Proceedings of the Network and Parallel Computing, 9th IFIP International Conference, 2012
A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Proceedings of the Transition of HPC Towards Exascale Computing, 2012
Dynamic percolation: a case of study on the shortcomings of traditional optimization in many-core architectures.
Proceedings of the Computing Frontiers Conference, CF'12, 2012
2011
Analysis and performance results of computing betweenness centrality on IBM Cyclops64.
J. Supercomput., 2011
Comput. Sci. Res. Dev., 2011
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
Proceedings of the Languages and Compilers for Parallel Computing, 2011
OPELL and PM: A Case Study on Porting Shared Memory Programming Models to Accelerators Architectures.
Proceedings of the Languages and Compilers for Parallel Computing, 2011
The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011
DEEP: an iterative fpga-based many-core emulation system for chip verification and architecture research.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
2010
Proceedings of the 2010 Spring Simulation Multiconference, 2010
Proceedings of the Languages and Compilers for Parallel Computing, 2010
TiNy threads on BlueGene/P: Exploring many-core parallelisms beyond The traditional OS.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010
A Study of a Software Cache Implementation of the OpenMP Memory Model for Multicore and Manycore Architectures.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010
Proceedings of the CGO 2010, 2010
2009
Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2009
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009
Proceedings of the 28th International Performance Computing and Communications Conference, 2009
Proceedings of the ICPP 2009, 2009
Tile Percolation: An OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009
Proceedings of the 6th Conference on Computing Frontiers, 2009
2008
ACM Trans. Program. Lang. Syst., 2008
J. Bioinform. Comput. Biol., 2008
Experience on optimizing irregular computation for memory hierarchy in manycore architecture.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008
Minimum Lock Assignment: A Method for Exploiting Concurrency among Critical Sections.
Proceedings of the Languages and Compilers for Parallel Computing, 2008
Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture.
Proceedings of the Languages and Compilers for Parallel Computing, 2008
Open64 compiler infrastructure for emerging multicore/manycore architecture All Symposium Tutorial.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
2007
ACM Trans. Archit. Code Optim., 2007
Performance portability on EARTH: a case study across several parallel architectures.
Clust. Comput., 2007
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007
Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform.
Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications, 2007
Optimized lock assignment and allocation: a method for exploiting concurrency among critical sections.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2007
Proceedings of the Languages and Compilers for Parallel Computing, 2007
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
On the Role of Deterministic Fine-Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Automatic Program Segment Similarity Detection in Targeted Program Performance Improvement.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
2006
User-Friendly Methodology for Automatic Exploration of Compiler Options: A Case Study on the Intel XScale Microarchitecture.
Proceedings of the International Conference on Software Engineering Research and Practice & Conference on Programming Languages and Compilers, 2006
A User-Friendly Methodology for Automatic Exploration of Compiler Options.
Proceedings of the International Conference on Software Engineering Research and Practice & Conference on Programming Languages and Compilers, 2006
Performance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006
Exploring Financial Applications on Many-Core-on-a-Chip Architecture: A First Experiment.
Proceedings of the Frontiers of High Performance Computing and Networking, 2006
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Proceedings of the 20th Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2006), 2006
Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip.
Proceedings of the Third Conference on Computing Frontiers, 2006
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006
2005
J. Embed. Comput., 2005
Quasi-consensus-based comparison of profile hidden Markov models for protein sequences.
Bioinform., 2005
An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes.
Bioinform., 2005
Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005
Sequential Consistency Revisit: The Sufficient Condition and Method to Reason the Consistency Model of a Multiprocessor-on-a-Chip Architecture.
Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, 2005
Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2005
Register Pressure in Software-Pipelined Loop Nests: Fast Computation and Impact on Architecture Design.
Proceedings of the Languages and Compilers for Parallel Computing, 2005
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Discriminating transmembrane proteins from signal peptides using SVM-Fisher approach.
Proceedings of the Fourth International Conference on Machine Learning and Applications, 2005
Identifying Multiply-Add Operations in Kylin Compiler.
Proceedings of The 2005 International Conference on Embedded Systems and Applications, 2005
2004
A fine-grain load-adaptive algorithm of the 2D discrete wavelet transform for multithreaded architectures.
J. Parallel Distributed Comput., 2004
Int. J. High Perform. Comput. Netw., 2004
Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004), 2004
Proceedings of the Euro-Par 2004 Parallel Processing, 2004
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004
2003
ACM Trans. Embed. Comput. Syst., 2003
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures.
IEEE Trans. Computers, 2003
J. Comput. Sci. Technol., 2003
Implementation of the EARTH programming model on SMP clusters: a multi-threaded language and runtime system.
Concurr. Comput. Pract. Exp., 2003
Proceedings of the Languages and Compilers for Parallel Computing, 2003
Proceedings of the High Performance Computing, 5th International Symposium, 2003
Performance Study of a Whole Genome Comparison Tool on a Hyper-Threading Multiprocessor.
Proceedings of the High Performance Computing, 5th International Symposium, 2003
An Executable Analytical Performance Evaluation Approach for Early Performance Prediction.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Programming Models and System Software for Future High-End Computing Systems: Work-in-Progress.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Proceedings of the 17th Annual International Conference on Supercomputing, 2003
Proceedings of the 2003 IEEE International Conference on Field-Programmable Technology, 2003
Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference, 2003
2002
Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks.
J. VLSI Signal Process., 2002
Parallel Distributed Comput. Pract., 2002
A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors.
Des. Autom. Embed. Syst., 2002
Implementation and evaluation of a communication intensive application on the EARTH multithreaded system.
Concurr. Comput. Pract. Exp., 2002
Bioinform., 2002
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002
Proceedings of the 2002 IEEE Symposium on Information Visualization (InfoVis 2002), 27 October, 2002
Power-Performance Trade-Offs for Energy-Efficient Architectures: A Quantitative Study.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002
An Adaptive Meta-Clustering Approach: Combining the Information from Different Clustering Results.
Proceedings of the 1st IEEE Computer Society Bioinformatics Conference, 2002
Proceedings of the 1st IEEE Computer Society Bioinformatics Conference, 2002
Proceedings of the International Conference on Compilers, 2002
2001
Parallel Process. Lett., 2001
Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions.
Clust. Comput., 2001
A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison.
Proceedings of the 6th Pacific Symposium on Biocomputing, 2001
Proceedings of the 14th International Symposium on Systems Synthesis, 2001
Bridging the gap between ISA compilers and silicon compilers a challenge for future SoC design.
Proceedings of the 14th International Symposium on Systems Synthesis, 2001
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001
Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001
Proceedings of the Euro-Par 2001: Parallel Processing, 2001
Proceedings of the Compiler Construction, 10th International Conference, 2001
2000
IEEE Trans. Computers, 2000
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory.
Int. J. Parallel Program., 2000
Proceedings of the Twelfth annual ACM Symposium on Parallel Algorithms and Architectures, 2000
Landing CG on EARTH: A Case Study of Fine-Grained Multithreading on an Evolutionary Path.
Proceedings of the Proceedings Supercomputing 2000, 2000
Recursive and Iterative Multithreaded Algorithms for Pricing American Securities.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000
Proceedings of the High Performance Computing, Third International Symposium, 2000
Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000
Proceedings of the Parallel and Distributed Processing, 2000
Proceedings of the 14th international conference on Supercomputing, 2000
Developing a Communication Intensive Application on the EARTH Multithreaded Architecture (Distinguished Paper).
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000
Proceedings of the 12th IEEE International Conference on Application-Specific Systems, 2000
1999
J. Parallel Distributed Comput., 1999
Self-Avoiding Walks Over Adaptive Triangular Grids.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999
Minimum Register Instruction Scheduling: A New Approach for Dynamic Instruction Issue Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 1999
Proceedings of the High Performance Computing, Second International Symposium, 1999
Implementing a Non-Strict Functional Programming Language on a Threaded Architecture.
Proceedings of the Parallel and Distributed Processing, 1999
Load Adaptive Algorithms and Implementations for the 2D Discrete Wavelet Transform on Fine-Grain Multithreaded Architectures.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999
Proceedings of the Parallel and Distributed Processing, 1999
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999
Proceedings of the Compiler Construction, 8th International Conference, 1999
1998
ACM Trans. Program. Lang. Syst., 1998
A Unified Framework for Instruction Scheduling and Mapping for Function Units with Structural Hazards.
J. Parallel Distributed Comput., 1998
How "Hard" is Thread Partitioning and How "Bad" is a List Scheduling Based Partitioning Algorithm?
Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, 1998
Using Multithreading for the Automatic Load Balancing of Adaptive Finite Element Meshes.
Proceedings of the Solving Irregularly Structured Problems in Parallel, 1998
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998
Proceedings of the International Conference on Parallel and Distributed Systems, 1998
Partial Sampling with Reverse State Reconstruction: A New Technique for Branch Predictor Performance Estimation.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998
Proceedings of the Compiler Construction, 7th International Conference, 1998
1997
Int. J. Parallel Program., 1997
Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, 1997
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1997
On the Importance of an End-To-End View of Memory Consistency in Future Computer Systems.
Proceedings of the High Performance Computing, International Symposium, 1997
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997
1996
IEEE Trans. Parallel Distributed Syst., 1996
Proceedings of the ACM SIGPLAN'96 Conference on Programming Language Design and Implementation (PLDI), 1996
Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler.
Proceedings of the ACM SIGPLAN'96 Conference on Programming Language Design and Implementation (PLDI), 1996
Proceedings of the MASCOTS '96, 1996
Proceedings of the Languages and Compilers for Parallel Computing, 1996
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996
Quantitive studies of data-locality sensitivity on the EARTH multithreaded architecture: preliminary results.
Proceedings of the 3rd International Conference on High Performance Computing, 1996
Multithreading implementation of a distributed shortest path algorithm on EARTH multiprocessor.
Proceedings of the 3rd International Conference on High Performance Computing, 1996
Proceedings of the Euro-Par '96 Parallel Processing, 1996
Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops.
Proceedings of the Compiler Construction, 6th International Conference, 1996
Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor.
Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative Research, 1996
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996
1995
Parallel Process. Lett., 1995
Proceedings of the Seventh IEEE Symposium on Parallel and Distributed Processing, 1995
Proceedings of the Conference Record of POPL'95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1995
Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), 1995
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995
Proceedings of the Languages and Compilers for Parallel Computing, 1995
The Threaded Communication Library: Preliminary Experiences on a Multiprocessor with Dual-Processor Nodes.
Proceedings of the 9th international conference on Supercomputing, 1995
Location Consistency: Stepping Beyond the Memory Coherence Barrier.
Proceedings of the 1995 International Conference on Parallel Processing, 1995
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995
Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995
Proceedings of the Euro-Par '95 Parallel Processing, 1995
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995
Advanced topics in dataflow computing and multithreading.
IEEE, ISBN: 978-0-8186-6542-4, 1995
1994
Proceedings of the PARLE '94: Parallel Architectures and Languages Europe, 1994
Minimizing register requirements under resource-constrained rate-optimal software pipelining.
Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994
Proceedings of the 8th International Symposium on Parallel Processing, 1994
A Comparative Study of Multiprocessor List Scheduling Heuristics.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994
Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, October 31, 1994
Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, October 31, 1994
Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, October 31, 1994
Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, October 31, 1994
Proceedings of the International Conference on Application Specific Array Processors, 1994
Proceedings of the Multithreaded Computer Architecture, 1994
Proceedings of the Multithreaded Computer Architecture, 1994
1993
Special Issue on DataFlow and Multithreaded Architectures - Guest Editors' Introduction.
J. Parallel Distributed Comput., 1993
J. Parallel Distributed Comput., 1993
Comput. Lang., 1993
Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993
Proceedings of the Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1993
Proceedings of the PARLE '93, 1993
Proceedings of the Languages and Compilers for Parallel Computing, 1993
Proceedings of the 7th international conference on Supercomputing, 1993
A Novel Methodology Using Genetic Algorithms for the Design of Caches and Cache Replacement Policy.
Proceedings of the 5th International Conference on Genetic Algorithms, 1993
Proceedings of the International Conference on Application-Specific Array Processors, 1993
1992
Int. J. Parallel Program., 1992
Future Gener. Comput. Syst., 1992
Minimizing Loop Storage Allocation for An Argument-Fetching Dataflow Architecture Model.
Proceedings of the PARLE '92: Parallel Architectures and Languages Europe, 1992
Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992
Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations.
Proceedings of the Languages and Compilers for Parallel Computing, 1992
Proceedings of the Languages and Compilers for Parallel Computing, 1992
Efficient Interprocessor Synchronization/Communication on a Dataflow Multiprocessor Architecture.
Proceedings of the 1992 International Conference on Parallel Processing, 1992
Designing programming languages for analyzability: a fresh look at pointer data structures.
Proceedings of the ICCL'92, 1992
Performance Evaluation of Latency Tolerant Architectures.
Proceedings of the Computing and Information, 1992
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992
Proceedings of the Parallel Processing: CONPAR 92, 1992
Proceedings of the Compiler Construction, 1992
1991
Efficient support of concurrent threads in a hybrid dataflow/von Neumann architecture.
Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, 1991
Proceedings of the Proceedings Supercomputing '91, 1991
Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation (PLDI), 1991
Proceedings of the PARLE '91: Parallel Architectures and Languages Europe, 1991
Proceedings of the PARLE '91: Parallel Architectures and Languages Europe, 1991
Proceedings of the Languages and Compilers for Parallel Computing, 1991
Proceedings of the 5th international conference on Supercomputing, 1991
A code mapping scheme for dataflow software pipelining.
The Kluwer international series in engineering and computer science 125, Kluwer, ISBN: 978-0-7923-9130-2, 1991
1990
Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing, 1990
Proceedings of the 4th international conference on Supercomputing, 1990
Proceedings of the CONPAR 90, 1990
1989
J. Parallel Distributed Comput., 1989
1988
Summary of the workshop on frontiers in functional programming and dataflow architecture.
SIGARCH Comput. Archit. News, 1988
Proceedings of the Proceedings Supercomputing '88, Orlando, FL, USA, November 12-17, 1988, 1988
Design of an Efficient Dataflow Architecture without Data Flow.
Proceedings of the International Conference on Fifth Generation Computer Systems, 1988
1987
A stability classification method and its application to pipelined solution of linear recurrences.
Parallel Comput., 1987
1986
J. Parallel Distributed Comput., 1986
Int. J. Parallel Program., 1986
A Pipelined Solution Method of Tridiagonal Linear Equation Systems.
Proceedings of the International Conference on Parallel Processing, 1986
1984
1983
Maximum Pipelining of Array Operations on Static Data Flow Machine.
Proceedings of the International Conference on Parallel Processing, 1983