Dimitrios S. Nikolopoulos

Orcid: 0000-0003-0217-8307

According to our database1, Dimitrios S. Nikolopoulos authored at least 239 papers between 1998 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
On Robust Optimal Joint Deployment and Assignment of RAN Intelligent Controllers in O-RANs.
IEEE Open J. Commun. Soc., 2024

HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments.
CoRR, 2024

FrameFeedback: A Closed-Loop Control System for Dynamic Offloading Real-Time Edge Inference.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Application-Attuned Memory Management for Containerized HPC Workflows.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023
Decentralised Biomedical Signal Classification using Early Exits.
Proceedings of the 21st IEEE Interregional NEWCAS Conference, 2023

2022
Efficient, Dynamic Multi-Task Execution on FPGA-Based Computing Systems.
IEEE Trans. Parallel Distributed Syst., 2022

Power Log'n'Roll: Power-Efficient Localized Rollback for MPI Applications Using Message Logging Protocols.
IEEE Trans. Parallel Distributed Syst., 2022

Mixed-Precision Kernel Recursive Least Squares.
IEEE Trans. Neural Networks Learn. Syst., 2022

gShare: A centralized GPU memory management framework to enable GPU memory sharing for containers.
Future Gener. Comput. Syst., 2022

On Realizing Efficient Deep Learning Using Serverless Computing.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021
Revealing DRAM Operating GuardBands Through Workload-Aware Error Predictive Modeling.
IEEE Trans. Computers, 2021

Linear Regression Based DDoS Attack Detection.
Proceedings of the ICMLC 2021: 13th International Conference on Machine Learning and Computing, 2021

2020
ENORM: A Framework For Edge NOde Resource Management.
IEEE Trans. Serv. Comput., 2020

AIR: Iterative refinement acceleration using arbitrary dynamic precision.
Parallel Comput., 2020

DYVERSE: DYnamic VERtical Scaling in multi-tenant Edge environments.
Future Gener. Comput. Syst., 2020

Fast load balance parallel graph analytics with an automatic graph data structure selection algorithm.
Future Gener. Comput. Syst., 2020

RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices.
Proceedings of the 2020 USENIX Conference on Operational Machine Learning, 2020

DStress: Automatic Synthesis of DRAM Reliability Stress Viruses using Genetic Algorithms.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

DroidLight: Lightweight Anomaly-based Intrusion Detection System for Smartphone Devices.
Proceedings of the ICDCN 2020: 21st International Conference on Distributed Computing and Networking, 2020

DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Fast Analysis and Prediction in Large Scale Virtual Machines Resource Utilisation.
Proceedings of the 10th International Conference on Cloud Computing and Services Science, 2020

Cross Architectural Power Modelling.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

HaRMony: Heterogeneous-Reliability Memory and QoS-Aware Energy Management on Virtualized Servers.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Hyperqueues: Design and Implementation of Deterministic Concurrent Queues.
ACM Trans. Parallel Comput., 2019

Fast and Energy-Efficient OLAP Data Management on Hybrid Main Memory Systems.
IEEE Trans. Computers, 2019

Shimmer: Implementing a Heterogeneous-Reliability DRAM Framework on a Commodity Server.
IEEE Comput. Archit. Lett., 2019

Implementing efficient message logging protocols as MPI application extensions.
Proceedings of the 26th European MPI Users' Group Meeting, 2019

VEBO: a vertex- and edge-balanced ordering heuristic to load balance parallel graph processing.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Workload-Aware DRAM Error Prediction using Machine Learning.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

TAPAS: Train-Less Accuracy Predictor for Architecture Search.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Intra-Node Memory Safe GPU Co-Scheduling.
IEEE Trans. Parallel Distributed Syst., 2018

NanoStreams: A Microserver Architecture for Real-Time Analytics on Fast Data Streams.
IEEE Trans. Multi Scale Comput. Syst., 2018

A taxonomy of task-based parallel programming technologies for high-performance computing.
J. Supercomput., 2018

Int. J. High Perform. Comput. Appl., 2018

Energy-Efficient Iterative Refinement Using Dynamic Precision.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2018

RADS: Real-time Anomaly Detection System for Cloud Data Centres.
CoRR, 2018

Energy-efficient localised rollback after failures via data flow analysis.
CoRR, 2018

Expediting assessments of database performance for streams of respiratory parameters.
Comput. Biol. Medicine, 2018

Characterization of HPC workloads on an ARMv8 based server under relaxed DRAM refresh and thermal stress.
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

The VINEYARD integrated framework for hardware accelerators in the cloud.
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

Energy-efficient localised rollback via data flow analysis and frequency scaling.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Variation-Aware Pipelined Cores through Path Shaping and Dynamic Cycle Adjustment: Case Study on a Floating-Point Unit.
Proceedings of the International Symposium on Low Power Electronics and Design, 2018

Minimization of Timing Failures in Pipelined Designs via Path Shaping and Operand Truncation.
Proceedings of the 24th IEEE International Symposium on On-Line Testing And Robust System Design, 2018

DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server.
Proceedings of the 24th IEEE International Symposium on On-Line Testing And Robust System Design, 2018

Userspace Hypervisor Data Characterization in Virtualized Environment.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Supporting Cloud IaaS Users in Detecting Performance-Based Violation for Streaming Applications.
Proceedings of the 2018 IEEE International Conference on Autonomic Computing, 2018

Code and Data Transformations to Address Garbage Collector Performance in Big Data Processing.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

The transprecision computing paradigm: Concept, design, and applications.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018


2017
FairGV: Fair and Fast GPU Virtualization.
IEEE Trans. Parallel Distributed Syst., 2017

ALEA: A Fine-Grained Energy Profiling Tool.
ACM Trans. Archit. Code Optim., 2017

SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads.
ACM Trans. Archit. Code Optim., 2017

Managed acceleration for In-Memory database analytic workloads.
Int. J. Parallel Emergent Distributed Syst., 2017

On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework.
Int. J. Parallel Program., 2017

GPU Virtualization and Scheduling Methods: A Comprehensive Survey.
ACM Comput. Surv., 2017

Feasibility of Fog Computing.
CoRR, 2017

Dependency-Aware Rollback and Checkpoint-Restart for Distributed Task-Based Runtimes.
CoRR, 2017

Error-Resilient Server Ecosystems for Edge and Cloud Datacenters.
Computer, 2017

REFINE: realistic fault injection via compiler-based instrumentation for accuracy, portability and speed.
Proceedings of the International Conference for High Performance Computing, 2017

Access-aware DRAM failure-rate estimation under relaxed refresh operations.
Proceedings of the 2017 International Conference on Embedded Computer Systems: Architectures, 2017

A Taxonomy of Task-Based Technologies for High-Performance Computing.
Proceedings of the Parallel Processing and Applied Mathematics, 2017

Incremental Training of Deep Convolutional Neural Networks.
Proceedings of the International Workshop on Automatic Selection, 2017

Edge-as-a-Service: Towards Distributed Cloud Architectures.
Proceedings of the Parallel Computing is Everywhere, 2017

Power Modelling for Heterogeneous Cloud-Edge Data Centers.
Proceedings of the Parallel Computing is Everywhere, 2017

MiniSymposium on Edge Computing.
Proceedings of the Parallel Computing is Everywhere, 2017

Relaxing DRAM refresh rate through access pattern scheduling: A case study on stencil-based algorithms.
Proceedings of the 23rd IEEE International Symposium on On-Line Testing and Robust System Design, 2017

GraphGrind: addressing load imbalance of graph partitioning.
Proceedings of the International Conference on Supercomputing, 2017

Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Using Docker Swarm with a User-Centric Decision-Making Framework for Cloud Application Migration.
Proceedings of the Cloud Computing and Service Science - 7th International Conference, 2017

MyMinder: A User-centric Decision Making Framework for Intercloud Migration.
Proceedings of the CLOSER 2017, 2017

2016
Special issue on Disruptive technologies for energy efficient computing.
Sustain. Comput. Informatics Syst., 2016

Editorial of the Special issue: SI: E2SC.
Parallel Comput., 2016

Exploiting Significance of Computations for Energy-Constrained Approximate Computing.
Int. J. Parallel Program., 2016

Evaluating fault tolerance on asymmetric multicore systems-on-chip using iso-metrics.
IET Comput. Digit. Tech., 2016

Energy Optimization of Memory Intensive Parallel workloads.
CoRR, 2016

Myrmics: Scalable, Dependency-aware Task Scheduling on Heterogeneous Manycores.
CoRR, 2016

BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores.
CoRR, 2016

TwinCG: Dual Thread Redundancy with Forward Recovery for Conjugate Gradient Methods.
CoRR, 2016

Methods and metrics for fair server assessment under real-time financial workloads.
Concurr. Comput. Pract. Exp., 2016

Brief Announcement: Energy Optimization of Memory Intensive Parallel Workloads.
Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016

Challenges and Opportunities in Edge Computing.
Proceedings of the 2016 IEEE International Conference on Smart Cloud, 2016

Runtime support for adaptive power capping on heterogeneous SoCs.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016


VarSys Introduction.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform.
Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, 2016

Operator and Workflow Optimization for High-Performance Analytics.
Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, 2016

ECOSCALE: Reconfigurable computing and runtime system for future exascale systems.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

HPTA: High-performance text analytics.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Big data availability: Selective partial checkpointing for in-memory database queries.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

A scalable and composable map-reduce system.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Low-Cost Hardware Infrastructure for Runtime Thread Level Energy Accounting.
Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016

The VINEYARD Approach: Versatile, Integrated, Accelerator-Based, Heterogeneous Data Centres.
Proceedings of the Applied Reconfigurable Computing - 12th International Symposium, 2016

A Capital Market Metaphor for Content Delivery Network Resources.
Proceedings of the 30th IEEE International Conference on Advanced Information Networking and Applications, 2016

2015
Realizing Accelerated Cost-Effective Distributed RAID.
Proceedings of the Handbook on Data Centers, 2015

TProf: An energy profiler for task-parallel programs.
Sustain. Comput. Informatics Syst., 2015

Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics.
Parallel Process. Lett., 2015

Scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-cores.
Int. J. Parallel Emergent Distributed Syst., 2015

On the potential of significance-driven execution for energy-aware HPC.
Comput. Sci. Res. Dev., 2015

Guest Editorial.
IET Comput. Digit. Tech., 2015

Evaluating Asymmetric Multicore Systems-on-Chip using Iso-Metrics.
CoRR, 2015

On the Energy-Efficiency of Byte-Addressable Non-Volatile Memory.
IEEE Comput. Archit. Lett., 2015

Towards automated data-driven model creation for cloud computing simulation.
Proceedings of the 8th International Conference on Simulation Tools and Techniques, 2015

A programming model and runtime system for significance-aware energy-efficient computing.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Mini-Symposium on Energy and Resilience in Parallel Programming.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

HpMC: An Energy-aware Management System of Multi-level Memory Architectures.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Application-Level Energy Awareness for OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Power Capping: What Works, What Does Not.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Energy-Efficient In-Memory Data Stores on Hybrid Memory Hierarchies.
Proceedings of the 11th International Workshop on Data Management on New Hardware, 2015

LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres.
Proceedings of the Cloud Computing and Services Science - 5th International Conference, 2015

A Lightweight Tool for Anomaly Detection in Cloud Data Centres.
Proceedings of the CLOSER 2015, 2015

A significance-driven programming framework for energy-constrained approximate computing.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

Software-managed energy-efficient hybrid DRAM/NVM main memory.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

ALEA: Fine-Grain Energy Profiling with Basic Block Sampling.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Energy-Efficient Hybrid DRAM/NVM Main Memory.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Hybrid address spaces: A methodology for implementing scalable high-level programming models on non-coherent many-core architectures.
J. Syst. Softw., 2014

FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boards.
J. Syst. Archit., 2014

Distributed region-based memory allocation and synchronization.
Int. J. High Perform. Comput. Appl., 2014

Energy Efficiency through Significance-Based Computing.
Computer, 2014

On the viability of microservers for financial analytics.
Proceedings of the 7th Workshop on High Performance Computational Finance, 2014

Fast Dynamic Binary Rewriting for flexible thread migration on shared-ISA heterogeneous MPSoCs.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Power-capped DVFS and thread allocation with ANN models on modern NUMA systems.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Power modelling and capping for heterogeneous ARM/FPGA SoCs.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

The CACTOS Vision of Context-Aware Cloud Topology Optimization and Simulation.
Proceedings of the IEEE 6th International Conference on Cloud Computing Technology and Science, 2014

2013
Strategies for Energy-Efficient Resource Management of Hybrid Programming Models.
IEEE Trans. Parallel Distributed Syst., 2013

Analysis of dependence tracking algorithms for task dataflow execution.
ACM Trans. Archit. Code Optim., 2013

Deterministic scale-free pipeline parallelism with hyperqueues.
Proceedings of the International Conference for High Performance Computing, 2013

DRASync: distributed region-based memory allocation and synchronization.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Prefetching and cache management using task lifetimes.
Proceedings of the International Conference on Supercomputing, 2013

Topic 1: Support Tools and Environments - (Introduction).
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores.
Proceedings of the First International Workshop on Code Optimisation for Multi and Many Cores, 2013

Inference and Declaration of Independence in Task-Parallel Programs.
Proceedings of the Advanced Parallel Processing Technologies, 2013

BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism.
Proceedings of the Advanced Parallel Processing Technologies, 2013

2012
Critical path-based thread placement for NUMA systems.
SIGMETRICS Perform. Evaluation Rev., 2012

EPC: a power instrumentation controller for embedded applications.
SIGBED Rev., 2012

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs.
Int. J. Parallel Program., 2012

BTL: A Framework for Measuring and Modeling Energy in Memory Hierarchies.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

BDDT: : block-level dynamic dependence analysis for deterministic task-based parallelism.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

On the Use of GPUs in Realizing Cost-Effective Distributed RAID.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

The myrmics memory allocator: hierarchical, message-passing allocation for global address spaces.
Proceedings of the International Symposium on Memory Management, 2012

Model-based, memory-centric performance and power optimization on NUMA multiprocessors.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Dynamic binary rewriting and migration for shared-ISA asymmetric, multicore processors: summary.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

Formic: Cost-efficient and Scalable Prototyping of Manycore Architectures.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

Topic 16: GPU and Accelerators Computing.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Inference and declaration of independence: impact on deterministic task parallelism.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
A capabilities-aware framework for using computational accelerators in data-intensive computing.
J. Parallel Distributed Comput., 2011

Task-based parallel H.264 video encoding for explicit communication architectures.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

A programming model for deterministic task parallelism.
Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '11, 2011

Scalable Runtime Support for Data Intensive Applications on the Single Chip Cloud Computer.
Proceedings of the 3rd Many-core Applications Research Community (MARC) Symposium. Proceedings of the 3rd MARC Symposium, 2011

Parallel Programming of General-Purpose Programs Using Task-Based Programming Models.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

Fine-grain OpenMP runtime support with explicit communication hardware primitives.
Proceedings of the Design, Automation and Test in Europe, 2011

Scalable memory registration for high performance networks using helper threads.
Proceedings of the 8th Conference on Computing Frontiers, 2011

A Unified Scheduler for Recursive and Task Dataflow Parallelism.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Explicit Communication and Synchronization in SARC.
IEEE Micro, 2010

Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories.
Proceedings of the Conference on High Performance Computing Networking, 2010

Hybrid MPI/OpenMP power-aware computing.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Power-aware MPI task aggregation prediction for high-end computing systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories.
Proceedings of the 39th International Conference on Parallel Processing, 2010

<i>Tagged Procedure Calls</i> (<i>TPC</i>): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Comparing Scalability Prediction Strategies on an SMP of CMPs.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Evaluation of streaming aggregation on parallel hardware architectures.
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems, 2010

On-chip communication and synchronization mechanisms with cache-integrated network interfaces.
Proceedings of the 7th Conference on Computing Frontiers, 2010

Designing Accelerator-Based Distributed Systems for High Performance.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
Supporting MapReduce on large-scale asymmetric multi-core clusters.
ACM SIGOPS Oper. Syst. Rev., 2009

Algorithm, software, and hardware optimizations for Delaunay mesh generation on simultaneous multithreaded architectures.
J. Parallel Distributed Comput., 2009

A multigrain Delaunay mesh generation method for multicore SMT-based architectures.
J. Parallel Distributed Comput., 2009

Green Building Blocks - Software Stacks for Energy-Efficient Clusters and Data Centres.
ERCIM News, 2009

Programming Multiprocessors with Explicitly Managed Memory Hierarchies.
Computer, 2009

A comparison of programming models for multiprocessors with explicitly managed memory hierarchies.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Scheduling dynamic parallelism on accelerators.
Proceedings of the 6th Conference on Computing Frontiers, 2009

2008
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes.
IEEE Trans. Parallel Distributed Syst., 2008

Set-Top Supercomputing: Scalable Software for Scientific Simulations on GameConsoles.
ERCIM News, 2008

VT-ASOS: Holistic system software customization for many cores.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE.
Proceedings of the High Performance Embedded Architectures and Compilers, 2008

DMA-based prefetching for i/o-intensive workloads on the cell architecture.
Proceedings of the 5th Conference on Computing Frontiers, 2008

Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine.
Proceedings of the 5th Conference on Computing Frontiers, 2008

Scheduling Asymmetric Parallelism on a PlayStation3 Cluster.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

Prediction models for multi-dimensional power-performance optimization on many cores.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell.
J. VLSI Signal Process., 2007

Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems.
Parallel Comput., 2007

Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory.
J. Grid Comput., 2007

Dynamic multigrain parallelization on the cell broadband engine.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Identifying energy-efficient concurrency levels using machine learning.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

A comparison of online and offline strategies for program adaptation.
Proceedings of the 45th Annual Southeast Regional Conference, 2007

2006
PACMAN: A PerformAnce Counters MANager for Intel Hyperthreaded Processors.
Proceedings of the Third International Conference on the Quantitative Evaluation of Systems (QEST 2006), 2006

Scalable locality-conscious multithreaded memory allocation.
Proceedings of the 5th International Symposium on Memory Management, 2006

MESA: reducing cache conflicts by integrating static and run-time methods.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Facing the challenges of multicore processor technologies using autonomic system software.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Online strategies for high-performance power-aware thread execution on emerging multiprocessors.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Online power-performance adaptation of multithreaded programs using hardware event-based prediction.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Runtime Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory.
Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, 2006

2005
An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Factory: An Object-Oriented Parallel Programming Substrate for Deep Multiprocessors.
Proceedings of the High Performance Computing and Communications, 2005

smt- <i>SPRINTS</i>: <i>S</i>oftware <i>Pr</i>ecomputation with <i>Int</i>elligent Streaming for Resource-Constrained SMTs.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2004
Dynamic tiling for effective use of shared caches on multithreaded processors.
Int. J. High Perform. Comput. Netw., 2004

Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processors.
Proceedings of the 7th Workshop on languages, 2004

Adapting to Memory Pressure from within Scientific Applications on Multiprogrammed COWs.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Realistic Workload Scheduling Policies for Taming the Memory Bandwidth Bottleneck of SMPs.
Proceedings of the High Performance Computing, 2004

2003
Scaling non-regular shared-memory codes by reusing custom loop schedules.
Sci. Program., 2003

Quantifying contention and balancing memory load on hardware DSM multiprocessors.
J. Parallel Distributed Comput., 2003

Adaptive scheduling under memory constraints on non-dedicated computationalfarms.
Future Gener. Comput. Syst., 2003

Code and Data Transformations for Improving Shared Cache Performance on SMT Processors.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Malleable Memory Mapping: User-Level Control of Memory Bounds for Effective Program Adaptation.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Scheduling Algorithms with Bus Bandwidth Considerations for SMPs.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

2002
Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors.
J. Parallel Distributed Comput., 2002

Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models.
Int. J. Parallel Program., 2002

Adaptive Scheduling under Memory Pressure on Multiprogrammed SMPs.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Quantifying and Resolving Remote Memory Access Contention on Hardware DSM Multiprocessors.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Adaptive Scheduling under Memory Pressure on Multiprogrammed Cluster.
Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), 2002

2001
Exploiting memory affinity in OpenMP through schedule reuse.
SIGARCH Comput. Archit. News, 2001

The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors.
Int. J. Parallel Program., 2001

A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Scaling irregular parallel codes with minimal programming effort.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

The trade-off between implicit and explicit data distribution in shared-memory programming paradigms.
Proceedings of the 15th international conference on Supercomputing, 2001

Informing Algorithms for Efficient Scheduling of Synchronizing Threads on Multiprogrammed SMPs.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

Improving Java Server Performance with Interruptlets.
Proceedings of the Computational Science - ICCS 2001, 2001

A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based Programming Models.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000
Is Data Distribution Necessary in OpenMP?
Proceedings of the Proceedings Supercomputing 2000, 2000

Efficient Dynamic Parallelism with OpenMP on Linux SMPs.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors.
Proceedings of the Languages, 2000

A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER.
Proceedings of the Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, 2000

Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Fast Synchronization on Scalable Cache-Coherent Multiprocessors using Hybrid Primitives.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

A case for use-level dynamic page migration.
Proceedings of the 14th international conference on Supercomputing, 2000

User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

1999
Fine-Grain and Multiprogramming-Conscious Nanothreading with the Solaris Operating System.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

Achieving multiprogramming scalability of parallel programs on Intel SMP platforms: Nanothreading in the Linux kernel.
Proceedings of the Parallel Computing: Fundamentals & Applications, 1999

A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000.
Proceedings of the 13th international conference on Supercomputing, 1999

1998
Efficient Runtime Thread Management for the Nano-Threads Programming Model.
Proceedings of the Parallel and Distributed Processing, 10 IPPS/SPDP'98 Workshops Held in Conjunction with the 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing, Orlando, Florida, USA, March 30, 1998

Kernel-level Scheduling for the Nano-threads Programming Model.
Proceedings of the 12th international conference on Supercomputing, 1998

Enhancing the Performance of Auroscheduling in Distributed Shared Memory Multiprocessors.
Proceedings of the Euro-Par '98 Parallel Processing, 1998


  Loading...