2024
LeanBin: Harnessing Lifting and Recompilation to Debloat Binaries.
Dataset, August, 2024
LeanBin: Harnessing Lifting and Recompilation to Debloat Binaries.
Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024
2020
AfterOMPT: An OMPT-Based Tool for Fine-Grained Tracing of Tasks and Loops.
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020
2019
Low-Precision Neural Network Decoding of Polar Codes.
Proceedings of the 20th IEEE International Workshop on Signal Processing Advances in Wireless Communications, 2019
2018
Type Information Elimination from Objects on Architectures with Tagged Pointers Support.
IEEE Trans. Computers, 2018
Leveraging Data-Flow Task Parallelism for Locality-Aware Dynamic Scheduling on Heterogeneous Platforms.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
Automated Analysis of Task-Parallel Execution Behavior Via Artificial Neural Networks.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
2017
Fuse: Accurate Multiplexing of Hardware Performance Counters Across Executions.
ACM Trans. Archit. Code Optim., 2017
Accurate and Complete Hardware Profiling for OpenMP - Multiplexing Hardware Events Across Executions.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017
MaxSim: A simulation platform for managed applications.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017
Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Euromicro Conference on Digital System Design, 2017
2016
NUMA-aware scheduling and memory allocation for data-flow task-parallel applications.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Language-Centric Performance Analysis of OpenMP Programs with Aftermath.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016
Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016
Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
2015
Effective Barrier Synchronization on Intel Xeon Phi Coprocessor.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015
2014
Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs.
ACM Trans. Archit. Code Optim., 2014
Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages.
ACM Trans. Archit. Code Optim., 2014
TERAFLUX: Harnessing dataflow in next generation teradevices.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Microprocess. Microsystems, 2014
Automatic Detection of Performance Anomalies in Task-Parallel Programs.
CoRR, 2014
Energy-aware parallelization flow and toolset for C code.
Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems, 2014
2013
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs.
ACM Trans. Archit. Code Optim., 2013
OpenStream: a data-flow approach to solving the von Neumann bottlenecks.
Proceedings of the International Workshop on Software and Compilers for Embedded Systems, 2013
Correct and Efficient Bounded FIFO Queues.
Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013
Correct and efficient work-stealing for weak memory models.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013
EU FP7-288307 Pharaon Project: Parallel and Heterogeneous Architecture for Real-Time Applications.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013
2012
Automatic Extraction of Coarse-Grained Data-Flow Threads from Imperative Programs.
IEEE Micro, 2012
2011
ACOTES Project: Advanced Compiler Technologies for Embedded Streaming.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Int. J. Parallel Program., 2011
A stream-computing extension to OpenMP.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011
2010
ERBIUM: a deterministic, concurrent intermediate representation for portable and scalable performance.
Proceedings of the 7th Conference on Computing Frontiers, 2010
Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes.
Proceedings of the 2010 International Conference on Compilers, 2010