Mateo Valero
Orcid: 0000-0003-2917-2482Affiliations:
- Polytechnic University of Catalonia, Barcelona, Spain
- Barcelona Supercomputing Center, Spain
According to our database1,
Mateo Valero
authored at least 474 papers
between 1982 and 2024.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2002, "For contributions to the design of vector, superscalar, and VLIW architectures, and technical leadership.".
IEEE Fellow
IEEE Fellow 2001, "For contributions to the design of vector architectures and superscalar processors.".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on orcid.org
-
on id.loc.gov
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
QUETZAL: Vector Acceleration Framework for Modern Genome Sequence Analysis Algorithms.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
2023
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications.
ACM Trans. Archit. Code Optim., June, 2023
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
Proceedings of the 38th Conference on Design of Circuits and Integrated Systems, 2023
2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the 37th Conference on Design of Circuits and Integrated Systems, 2022
2021
The Ultimate DataFlow for Ultimate SuperComputers-on-a-Chip, for Scientific Computing, Geo Physics, Complex Mathematics, and Information Processing.
Proceedings of the 10th Mediterranean Conference on Embedded Computing, 2021
VIA: A Smart Scratchpad for Vector Units with Application to Sparse Matrix Computations.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
Proceedings of the Euro-Par 2021: Parallel Processing, 2021
2020
Efficiency analysis of modern vector architectures: vector ALU sizes, core counts and clock frequencies.
J. Supercomput., 2020
Advances in the Hierarchical Emergent Behaviors (HEB) Approach to Autonomous Vehicles.
IEEE Intell. Transp. Syst. Mag., 2020
Semi-automatic validation of cycle-accurate simulation infrastructures: The case for gem5-x86.
Future Gener. Comput. Syst., 2020
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020
Improving Accuracy and Speeding Up Document Image Classification Through Parallel Systems.
Proceedings of the Computational Science - ICCS 2020, 2020
Improving Predication Efficiency through Compaction/Restoration of SIMD Instructions.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
Proceedings of the XXXV Conference on Design of Circuits and Integrated Systems, 2020
2019
IEEE Trans. Parallel Distributed Syst., 2019
J. Parallel Distributed Comput., 2019
Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications.
Int. J. Parallel Program., 2019
CCF Trans. High Perform. Comput., 2019
Optimizing computation-communication overlap in asynchronous task-based programs: poster.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019
Power efficient job scheduling by predicting the impact of processor manufacturing variability.
Proceedings of the ACM International Conference on Supercomputing, 2019
Proceedings of the ACM International Conference on Supercomputing, 2019
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019
2018
Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power Fused Multiply-Add.
IEEE Trans. Very Large Scale Integr. Syst., 2018
IEEE Trans. Parallel Distributed Syst., 2018
IEEE Trans. Parallel Distributed Syst., 2018
Performance and energy effects on task-based parallelized applications - User-directed versus manual vectorization.
J. Supercomput., 2018
Supercomput. Front. Innov., 2018
Proceedings of the International Conference for High Performance Computing, 2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies.
Proceedings of the 32nd International Conference on Supercomputing, 2018
Proceedings of the 32nd International Conference on Supercomputing, 2018
Architectural Support for Task Dependence Management with Flexible Software Scheduling.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018
2017
IEEE Trans. Parallel Distributed Syst., 2017
ACM Trans. Archit. Code Optim., 2017
Int. J. Parallel Program., 2017
Concurr. Comput. Pract. Exp., 2017
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017
Proceedings of the 25th IEEE International Symposium on Modeling, 2017
General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Picos, A Hardware Task-Dependence Manager for Task-Based Dataflow Programming Models.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017
Proceedings of the Second International Conference on Fog and Mobile Edge Computing, 2017
Direct Inter-Process Communication (dIPC): Repurposing the CODOMs Architecture to Accelerate IPC.
Proceedings of the Twelfth European Conference on Computer Systems, 2017
To Distribute or Not to Distribute: The Question of Load Balancing for Performance or Energy.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
Proceedings of the High Performance Computing - 4th Latin American Conference, 2017
2016
ACM Trans. Design Autom. Electr. Syst., 2016
IEEE Trans. Computers, 2016
ACM Trans. Archit. Code Optim., 2016
ACM Trans. Archit. Code Optim., 2016
IEEE Micro, 2016
ACM Comput. Surv., 2016
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the International Conference for High Performance Computing, 2016
Runtime Aware Architectures.
Proceedings of the 6th International Joint Conference on Pervasive and Embedded Computing and Communication Systems (PECCS 2016), 2016
Performance analysis of a hardware accelerator of dependence management for task-based dataflow programming models.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016
A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes.
Proceedings of the 2016 International Conference on Supercomputing, 2016
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
2015
J. Supercomput., 2015
Reimagining Heterogeneous Computing: A Functional Instruction-Set Architecture Computing Model.
IEEE Micro, 2015
Int. J. Distributed Sens. Networks, 2015
Future Gener. Comput. Syst., 2015
Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7.
CoRR, 2015
IEEE Comput. Archit. Lett., 2015
Proceedings of the International Conference for High Performance Computing, 2015
Performance and Energy Efficient Hardware-Based Scheduler for Symmetric/Asymmetric CMPs.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015
Joint Circuit-System Design Space Exploration of Multiplier Unit Structure for Energy-Efficient Vector Processors.
Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015
Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
Proceedings of the Euro-Par 2015: Parallel Processing, 2015
Proceedings of the Euro-Par 2015: Parallel Processing, 2015
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
2014
Analyzing the Efficiency of L1 Caches for Reliable Hybrid-Voltage Operation Using EDC Codes.
IEEE Trans. Very Large Scale Integr. Syst., 2014
ACM Trans. Design Autom. Electr. Syst., 2014
Microprocess. Microsystems, 2014
Int. J. Parallel Program., 2014
Proceedings of the Supercomputing - 29th International Conference, 2014
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014
Physical vs. Physically-Aware Estimation Flow: Case Study of Design Space Exploration of Adders.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the International Conference on Identification, 2014
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014
Proceedings of the International Conference on High Performance Computing & Simulation, 2014
Proceedings of the International Conference on High Performance Computing & Simulation, 2014
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014
Proceedings of the Euro-Par 2014 Parallel Processing, 2014
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014
Proceedings of the Computing Frontiers Conference, CF'14, 2014
Characterizing the Communication Demands of the Graph500 Benchmark on a Commodity Cluster.
Proceedings of the 1st IEEE/ACM International Symposium on Big Data Computing, 2014
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014
Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014
2013
Thread Assignment of Multithreaded Network Applications in Multicore/Multithreaded Processors.
IEEE Trans. Parallel Distributed Syst., 2013
Profile-guided transaction coalescing - lowering transactional overheads by merging transactions.
ACM Trans. Archit. Code Optim., 2013
ACM Trans. Archit. Code Optim., 2013
Programmability and portability for exascale: Top down programming methodology and tools with StarSs.
J. Comput. Sci., 2013
Proceedings of the International Conference for High Performance Computing, 2013
Killer-mobiles - The Way Towards Energy Efficient High Performance Computers?
Proceedings of the PECCS 2013, 2013
Proceedings of the 21st Euromicro International Conference on Parallel, 2013
Proceedings of the International Symposium on Quality Electronic Design, 2013
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Proceedings of the International Conference on High Performance Computing & Simulation, 2013
Proceedings of the 42nd International Conference on Parallel Processing, 2013
Proceedings of the International Conference on Computational Science, 2013
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013
Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip, 2013
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013
Proceedings of the Design, Automation and Test in Europe, 2013
APPLE: adaptive performance-predictable low-energy caches for reliable hybrid voltage operation.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013
2012
On the simulation of large-scale architectures using multiple application abstraction levels.
ACM Trans. Archit. Code Optim., 2012
ACM Trans. Archit. Code Optim., 2012
IEEE Micro, 2012
Understanding the future of energy-performance trade-off via DVFS in HPC environments.
J. Parallel Distributed Comput., 2012
Int. J. Parallel Program., 2012
The Network Adapter: The Missing Link between MPI Applications and Network Performance.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012
Proceedings of the NORCHIP 2012, Copenhagen, Denmark, November 12-13, 2012, 2012
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Enhancing the performance of assisted execution runtime systems through hardware/software techniques.
Proceedings of the International Conference on Supercomputing, 2012
Proceedings of the 41st International Conference on Parallel Processing, 2012
ADAM: an efficient data management mechanism for hybrid high and ultra-low voltage operation caches.
Proceedings of the Great Lakes Symposium on VLSI 2012, 2012
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012
2011
IEEE Trans. Parallel Distributed Syst., 2011
Trans. High Perform. Embed. Archit. Compil., 2011
Trans. High Perform. Embed. Archit. Compil., 2011
RMS-TM: a comprehensive benchmark suite for transactional memory systems (abstracts only).
SIGMETRICS Perform. Evaluation Rev., 2011
Exploiting intra-task slack time of load operations for DVFS in hard real-time multi-core systems.
SIGBED Rev., 2011
IEEE Micro, 2011
Int. J. Parallel Program., 2011
Int. J. High Perform. Comput. Appl., 2011
IEEE J. Emerg. Sel. Topics Circuits Syst., 2011
Concurr. Comput. Pract. Exp., 2011
Proceedings of the ICPE'11, 2011
Rapid Development of Error-Free Architectural Simulators Using Dynamic Runtime Testing.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011
IA^3: An Interference Aware Allocation Algorithm for Multicore Hard Real-Time Systems.
Proceedings of the 17th IEEE Real-Time and Embedded Technology and Applications Symposium, 2011
The Impact of Application's Micro-Imbalance on the Communication-Computation Overlap.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
An Abstraction Methodology for the Evaluation of Multi-core Multi-threaded Architectures.
Proceedings of the MASCOTS 2011, 2011
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 17th IEEE International On-Line Testing Symposium (IOLTS 2011), 2011
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011
Proceedings of the IEEE 29th International Conference on Computer Design, 2011
Proceedings of the High Performance Embedded Architectures and Compilers, 2011
Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
Proceedings of the 8th Conference on Computing Frontiers, 2011
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2011
SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional Memory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
2010
IEEE Trans. Computers, 2010
Proceedings of the 18th IEEE/IFIP VLSI-SoC 2010, 2010
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010
Thread to strand binding of parallel network applications in massive multi-threaded systems.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, 2010
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, 2010
Proceedings of the 24th International Conference on Supercomputing, 2010
Proceedings of the International Green Computing Conference 2010, 2010
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010
A Simulation Framework to Automatically Analyze the Communication-Computation Overlap in Scientific Applications.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010
Proceedings of the CISIS 2010, 2010
Proceedings of the 7th Conference on Computing Frontiers, 2010
Proceedings of the Architecture of Computing Systems, 2010
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
2009
Evaluación del rendimiento paralelo en el nivel macro bloque del decodificador H.264 en una arquitectura multiprocesador cc-NUMA.
Rev. Avances en Sistemas Informática, 2009
The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community.
Int. J. High Perform. Comput. Appl., 2009
IEEE Embed. Syst. Lett., 2009
Proceedings of the 21st International Symposium on Computer Architecture and High Performance Computing, 2009
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009
Turbocharging boosted transactions or: how i learnt to stop worrying and love longer transactions.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009
Taking the heat off transactions: Dynamic selection of pessimistic concurrency control.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd international conference on Supercomputing, 2009
Proceedings of the 23rd international conference on Supercomputing, 2009
Proceedings of the 23rd international conference on Supercomputing, 2009
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009
Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
Quantitative analysis of sequence alignment applications on multiprocessor architectures.
Proceedings of the 6th Conference on Computing Frontiers, 2009
Proceedings of the PACT 2009, 2009
2008
Int. J. Parallel Program., 2008
Int. J. Embed. Syst., 2008
Proceedings of the High Performance Computing for Computational Science, 2008
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008
Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008
Selection of the Register File Size and the Resource Allocation Policy on SMT Processors.
Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008
Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications.
Proceedings of the Embedded Computer Systems: Architectures, 2008
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008
Proceedings of the 9th workshop on MEmory performance, 2008
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 2008 International Conference on Parallel Processing, 2008
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008
Proceedings of the High Performance Embedded Architectures and Compilers, 2008
Proceedings of the High Performance Embedded Architectures and Compilers, 2008
Proceedings of the High Performance Embedded Architectures and Compilers, 2008
Proceedings of the Applications of Evolutionary Computing, 2008
The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment.
Proceedings of the 5th Conference on Computing Frontiers, 2008
Evolutionary system for prediction and optimization of hardware architecture performance.
Proceedings of the IEEE Congress on Evolutionary Computation, 2008
Proceedings of the Architecture of Computing Systems, 2008
Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2008
2007
SIGARCH Comput. Archit. News, 2007
unreadTVar: Extending Haskell Software Transactional Memory for Performance.
Proceedings of the Eighth Symposium on Trends in Functional Programming, 2007
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007
Proceedings of the 2007 workshop on MEmory performance, 2007
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007
Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007
Proceedings of the Euro-Par 2007 Workshops: Parallel Processing, 2007
Proceedings of the Advances in Computer Systems Architecture, 2007
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
2006
IEEE Trans. Computers, 2006
Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors.
IEEE Comput. Archit. Lett., 2006
Proceedings of the 2006 workshop on MEmory performance, 2006
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006
Proceedings of the Third Conference on Computing Frontiers, 2006
Proceedings of the Third Conference on Computing Frontiers, 2006
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006
2005
IEEE Trans. Computers, 2005
Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications.
ACM Trans. Archit. Code Optim., 2005
The impact of traffic aggregation on the memory performance of networking applications.
SIGARCH Comput. Archit. News, 2005
Int. J. High Perform. Comput. Netw., 2005
On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Proceedings of the High-Performance Computing - 6th International Symposium, 2005
Proceedings of the High-Performance Computing - 6th International Symposium, 2005
Proceedings of the High-Performance Computing - 6th International Symposium, 2005
Proceedings of the High-Performance Computing - 6th International Symposium, 2005
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 19th Annual International Conference on Supercomputing, 2005
Proceedings of the International Conference on Pervasive Services 2005, 2005
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005
Proceedings of the 2005 International Conference on Compilers, 2005
Proceedings of the 2005 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2005
2004
ACM Trans. Archit. Code Optim., 2004
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors.
SIGARCH Comput. Archit. News, 2004
Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures.
Int. J. Parallel Program., 2004
Int. J. High Perform. Comput. Netw., 2004
Int. J. High Perform. Comput. Netw., 2004
Int. J. High Perform. Comput. Netw., 2004
Int. J. High Perform. Comput. Netw., 2004
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004
Proceedings of the Computer Systems: Architectures, 2004
Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units.
Proceedings of the Computer Systems: Architectures, 2004
Proceedings of the Power-Aware Computer Systems, 4th International Workshop, 2004
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004
The impact of traffic aggregation on the memory performance of networking applications.
Proceedings of the 2004 workshop on MEmory performance, 2004
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004
A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004
Proceedings of the 2004 12th European Signal Processing Conference, 2004
Proceedings of the Euro-Par 2004 Parallel Processing, 2004
Proceedings of the Euro-Par 2004 Parallel Processing, 2004
Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004
Proceedings of the First Conference on Computing Frontiers, 2004
Proceedings of the First Conference on Computing Frontiers, 2004
Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004
2003
A Cost-Effective Architecture for Vectorizable Numerical and Multimedia Applications.
Theory Comput. Syst., 2003
IEEE Comput. Archit. Lett., 2003
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003
Proceedings of the High Performance Computing, 5th International Symposium, 2003
Proceedings of the High Performance Computing, 5th International Symposium, 2003
Proceedings of the High Performance Computing, 5th International Symposium, 2003
Proceedings of the High Performance Computing, 5th International Symposium, 2003
Proceedings of the High Performance Computing, 5th International Symposium, 2003
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Proceedings of the Global Telecommunications Conference, 2003
2002
SIGARCH Comput. Archit. News, 2002
IEEE Comput. Archit. Lett., 2002
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002
Proceedings of the High Performance Computing, 4th International Symposium, 2002
Proceedings of the High Performance Computing, 4th International Symposium, 2002
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002
Proceedings of the 31st International Conference on Parallel Processing (ICPP 2002), 2002
Proceedings of the Euro-Par 2002, 2002
Proceedings of the International Conference on Compilers, 2002
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002
2001
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures.
IEEE Trans. Computers, 2001
IEEE Trans. Computers, 2001
Parallel architecture and compilation techniques: selection of workshop papers, guests' editors introduction.
SIGARCH Comput. Archit. News, 2001
Modulo scheduling with integrated register spilling for clustered VLIW architectures.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001
Proceedings of the Languages and Compilers for Parallel Computing, 2001
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001
Proceedings of the 15th international conference on Supercomputing, 2001
Proceedings of the 15th international conference on Supercomputing, 2001
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001
Proceedings of the Euro-Par 2001: Parallel Processing, 2001
Proceedings of the Euro-Par 2001: Parallel Processing, 2001
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001
2000
J. Instr. Level Parallelism, 2000
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000
Proceedings of the 13th International Symposium on System Synthesis, 2000
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000
1999
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999
Proceedings of the 13th international conference on Supercomputing, 1999
Proceedings of the 13th international conference on Supercomputing, 1999
Proceedings of the 13th international conference on Supercomputing, 1999
Proceedings of the International Conference on Parallel Processing 1999, 1999
Proceedings of the International Conference on Parallel Processing 1999, 1999
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999
Quantifying the Benefits of SPECint Distant Parallelism in Simultaneous Multi-Threading Architectures.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999
1998
Int. J. Parallel Program., 1998
Proceedings of the Vector and Parallel Processing, 1998
Proceedings of the Vector and Parallel Processing, 1998
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998
Proceedings of the 12th international conference on Supercomputing, 1998
Proceedings of the 12th international conference on Supercomputing, 1998
Proceedings of the 12th international conference on Supercomputing, 1998
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998
1997
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997
Increasing Memory Bandwidth with Wide Buses: Compiler, Hardware and Performance Trade-Offs.
Proceedings of the 11th international conference on Supercomputing, 1997
Proceedings of the 11th international conference on Supercomputing, 1997
Proceedings of the 11th international conference on Supercomputing, 1997
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997
Proceedings of the Fourth International on High-Performance Computing, 1997
Simultaneous multithreaded vector architecture: merging ILP and DLP for high performance.
Proceedings of the Fourth International on High-Performance Computing, 1997
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997
1996
Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96), 1996
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996
1995
IEEE Trans. Computers, 1995
Int. J. Parallel Program., 1995
Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (PDP '95), 1995
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995
Proceedings of the 9th international conference on Supercomputing, 1995
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995
1994
Parallel Process. Lett., 1994
Proceedings of the Second Euromicro Workshop on Parallel and Distributed Processing, 1994
Proceedings of the Languages and Compilers for Parallel Computing, 1994
Proceedings of the 8th international conference on Supercomputing, 1994
Proceedings of the Parallel Processing: CONPAR 94, 1994
Proceedings of the Parallel Processing: CONPAR 94, 1994
1993
Microprocess. Microprogramming, 1993
Proceedings of the 1993 Euromicro Workshop on Parallel and Distributed Processing, 1993
Proceedings of the Languages and Compilers for Parallel Computing, 1993
1992
A method for implementation of one-dimensional systolic algorithms with data contraflow using pipelined functional units.
J. VLSI Signal Process., 1992
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992
Proceedings of the 6th international conference on Supercomputing, 1992
1991
Proceedings of the Languages and Compilers for Parallel Computing, 1991
Proceedings of the Distributed Memory Computing, 2nd European Conference, 1991
Mapping QR decomposition of a banded matrix on a ID systolic array with data contraflow and pipelined functional units.
Proceedings of the Algorithms and Parallel VLSI Architectures II, 1991
1990
Proceedings of the Application Specific Array Processors, 1990
1989
A block algorithm and optimal fixed-size systolic array processor for the algebraic path problem.
J. VLSI Signal Process., 1989
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989
1987
IEEE Trans. Computers, 1987
Partitioning: An Essential Step in Mapping Algorithms Into Systolic Array Processors.
Computer, 1987
1986
Proceedings of the 13th Annual Symposium on Computer Architecture, Tokyo, Japan, June 1986, 1986
Solving Matrix Problems with No Size Restriction on a Systolic Array Processor.
Proceedings of the International Conference on Parallel Processing, 1986
1985
Analysis and Simulation of Multiplexed Single-Bus Networks With and Without Buffering.
Proceedings of the 12th Annual Symposium on Computer Architecture, 1985
1983
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 1983
1982
IEEE Trans. Computers, 1982