Wu-chun Feng

Orcid: 0000-0002-6015-0727

According to our database1, Wu-chun Feng authored at least 273 papers between 1989 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SamBaS: Sampling-Based Stochastic Block Partitioning.
IEEE Trans. Netw. Sci. Eng., 2024

Optimizing and Scaling the 3D Reconstruction of Single-Particle Imaging.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

BLP: Block-Level Pipelining for GPUs.
Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024

G2A2: Graph Generator with Attributes and Anomalies.
Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024

2023
An Integrated Approach for Accelerating Stochastic Block Partitioning.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

On the Three P's of Parallel Programming for Heterogeneous Computing: Performance, Productivity, and Portability.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

Exact Distributed Stochastic Block Partitioning.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

On the Multi-Dimensional Acceleration of Stochastic Blockmodeling for Community Detection.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

GRAPPEL: A Graph-based Approach for Early Risk Assessment of Acute Hypertension in Critical Care.
Proceedings of the 14th ACM International Conference on Bioinformatics, 2023

2022
G2A2: An Automated Graph Generator with Attributes and Anomalies.
CoRR, 2022

On the Parallelization of MCMC for Community Detection.
Proceedings of the 51st International Conference on Parallel Processing, 2022

AutoPager: Auto-tuning Memory-Mapped I/O Parameters in Userspace.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022

Optimizing Performance and Storage of Memory-Mapped Persistent Data Structures.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022

Edge-Connected Jaccard Similarity for Graph Link Prediction on FPGA.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022

On the Characterization of the Performance-Productivity Gap for FPGA.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022

2021
IterML: Iterative Machine Learning for Intelligent Parameter Pruning and Tuning in Graphics Processing Units.
J. Signal Process. Syst., 2021

A Deep-Learning Framework for Improving COVID-19 CT Image Quality and Diagnostic Accuracy.
CoRR, 2021

Topology-Guided Sampling for Fast and Accurate Community Detection.
CoRR, 2021

Mitigating Catastrophic Forgetting in Deep Learning in a Streaming Setting Using Historical Summary.
Proceedings of the 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data, 2021

Scaling Out a Combinatorial Algorithm for Discovering Carcinogenic Gene Combinations to Thousands of GPUs.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Privateer: Multi-versioned Memory-mapped Data Stores for High-Performance Data Science.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

2020
Towards insight-driven sampling for big data visualisation.
Behav. Inf. Technol., 2020

ETH: An Architecture for Exploring the Design Space of In-situ Scientific Visualization.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

MetaCL: Automated "Meta" OpenCL Code Generation for High-Level Synthesis on FPGA.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

A Feasibility Study for MPI over HDFS.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Exploring FPGA Optimizations in OpenCL for Breadth-First Search on Sparse Graph Datasets.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

SparkLeBLAST: Scalable Parallelization of BLAST Sequence Alignment Using Spark.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

Alleviating Load Imbalance in Data Processing for Large-Scale Deep Learning.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

Approximate Pattern Matching for On-Chip Interconnect Traffic Prediction.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
GPU-Based Iterative Medical CT Image Reconstructions.
J. Signal Process. Syst., 2019

Scalable Deep Learning via I/O Analysis and Optimization.
ACM Trans. Parallel Comput., 2019

Fast Stochastic Block Partitioning via Sampling.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

C to D-Wave: A High-level C Compilation Framework for Quantum Annealers.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

Iterative machine learning (IterML) for effective parameter pruning and tuning in accelerators.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
A Framework for the Automatic Vectorization of Parallel Sort on x86-Based Processors.
IEEE Trans. Parallel Distributed Syst., 2018

A Composable Workflow for Productive Heterogeneous Computing on FPGAs via Whole-Program Analysis and Transformation.
Proceedings of the 2018 International Conference on ReConFigurable Computing and FPGAs, 2018

Exploring FPGA-specific Optimizations for Irregular OpenCL Applications.
Proceedings of the 2018 International Conference on ReConFigurable Computing and FPGAs, 2018

Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Making a Case for Green High-Performance Visualization Via Embedded Graphics Processors.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes.
Proceedings of the 47th International Conference on Parallel Processing, 2018

CommAnalyzer: automated estimation of communication cost and scalability on HPC clusters from sequential code.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Taming irregular applications via advanced dynamic parallelism on GPUs.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

GPU power prediction via ensemble machine learning for DVFS space exploration.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.
IEEE ACM Trans. Comput. Biol. Bioinform., 2017

Parallel programming with pictures is a Snap!
J. Parallel Distributed Comput., 2017

A runtime estimation framework for ALICE.
Future Gener. Comput. Syst., 2017

Eliminating Irregularities of Protein Sequence Search on Multicore Architectures.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

PaPar: A Parallel Data Partitioning Framework for Big Data Applications.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Directive-Based Partitioning and Pipelining for Graphics Processing Units.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Characterizing and Modeling Power and Energy for Extreme-Scale In-Situ Visualization.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

A framework for fast and fair evaluation of automata processing hardware.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

AutoMatch: An automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Demystifying automata processing: GPUs, FPGAs or Micron's AP?
Proceedings of the International Conference on Supercomputing, 2017

Fast segmented sort on GPUs.
Proceedings of the International Conference on Supercomputing, 2017

Parallel I/O Optimizations for Scalable Deep Learning.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Towards Scalable Deep Learning via I/O Analysis and Optimization.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Portable Parallel Design of Weighted Multi-Dimensional Scaling for Real-Time Data Analysis.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

An Enhanced Image Reconstruction Tool for Computed Tomography on CPUs.
Proceedings of the Computing Frontiers Conference, 2017

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on CPUs.
Proceedings of the Computing Frontiers Conference, 2017

Robotomata: A framework for approximate pattern matching of big data on an automata processor.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures.
J. Signal Process. Syst., 2016

MPI-ACC: Accelerator-Aware MPI for Scientific Applications.
IEEE Trans. Parallel Distributed Syst., 2016

Fast Detection of Transformed Data Leaks.
IEEE Trans. Inf. Forensics Secur., 2016

MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL.
Parallel Comput., 2016

muBLASTP: database-indexed protein sequence search on multicore CPUs.
BMC Bioinform., 2016

MetaMorph: a library framework for interoperable kernels on multi- and many-core clusters.
Proceedings of the International Conference for High Performance Computing, 2016

Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels.
Proceedings of the 24th IEEE International Symposium on Modeling, 2016

An automated framework for characterizing and subsetting GPGPU workloads.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

The Right Metric for Efficient Supercomputing: A Ten-Year Retrospective.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-Based Multi-and Many-Core Processors.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Parallel Programming with Pictures in a Snap!
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Measuring and modeling on-chip interconnect power on real hardware.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Parallel Transposition of Sparse Data Structures.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Telescoping Architectures: Evaluating Next-Generation Heterogeneous Computing.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

Bridging the Performance-Programmability Gap for FPGAs via OpenCL: A Case Study with OpenDwarfs.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Directive-Based Pipelining Extension for OpenMP.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Online Power Estimation of Graphics Processing Units.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Multiscale Approximation with Graphical Processing Units for Multiplicative Speedup in Molecular Dynamics.
Proceedings of the 7th ACM International Conference on Bioinformatics, 2016

Bridging the FPGA programmability-portability Gap via automatic OpenCL code generation and tuning.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

O3FA: A Scalable Finite Automata-based Pattern-Matching Engine for Out-of-Order Deep Packet Inspection.
Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems, 2016

2015
CoreTSAR: Core Task-Size Adapting Runtime.
IEEE Trans. Parallel Distributed Syst., 2015

Accelerating Bioinformatics Applications via Emerging Parallel Computing Systems.
IEEE ACM Trans. Comput. Biol. Bioinform., 2015

On the Energy Proportionality of Scale-Out Workloads.
CoRR, 2015

Towards Energy-Proportional Computing Using Subsystem-Level Power Management.
CoRR, 2015

Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures.
Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, Austin, TX, USA, January 31, 2015

On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

HPPAC Introduction and Committees.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

On the Greenness of In-Situ and Post-Processing Visualization Pipelines.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Rapid and parallel content screening for detecting transformed data exposure.
Proceedings of the 2015 IEEE Conference on Computer Communications Workshops, 2015

ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

GLAF: A Visual Programming and Auto-tuning Framework for Parallel Computing.
Proceedings of the 44th International Conference on Parallel Processing, 2015

pDindel: Accelerating indel detection on a multicore CPU architecture with SIMD.
Proceedings of the 5th IEEE International Conference on Computational Advances in Bio and Medical Sciences, 2015

Rapid Screening of Transformed Data Leaks with Efficient Algorithms and Parallel Computing.
Proceedings of the 5th ACM Conference on Data and Application Security and Privacy, 2015

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
SDAFT: A novel scalable data access framework for parallel BLAST.
Parallel Comput., 2014

A power-measurement methodology for large-scale, high-performance computing.
Proceedings of the ACM/SPEC International Conference on Performance Engineering, 2014

Aeromancer: A Workflow Manager for Large-Scale MapReduce-Based Scientific Workflows.
Proceedings of the 13th IEEE International Conference on Trust, 2014

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems.
Proceedings of the Supercomputing - 29th International Conference, 2014

On the Energy Proportionality of Distributed NoSQL Data Stores.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

On the performance and energy efficiency of FPGAs and GPUs for polyphase channelization.
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

SAIS-OPT: On the characterization and optimization of the SA-IS algorithm for suffix array construction.
Proceedings of the IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences, 2014

SLAM: scalable locality-aware middleware for I/O in scientific analysis and visualization.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Towards a performance-portable FFT library for heterogeneous computing.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

Enabling Efficient Power Provisioning for Enterprise Applications.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Runtime Adaptation for Autonomic Heterogeneous Computing.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

On the characterization of OpenCL dwarfs on fixed and reconfigurable platforms.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

Locality-aware memory association for multi-target worksharing in OpenMP.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator.
Parallel Comput., 2013

GBench: benchmarking methodology for evaluating the energy efficiency of supercomputers.
Comput. Sci. Res. Dev., 2013

The Green500 list: escapades to exascale.
Comput. Sci. Res. Dev., 2013

Performance characterization of data-intensive kernels on AMD Fusion architectures.
Comput. Sci. Res. Dev., 2013

Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications.
Big Data, 2013

Towards energy-proportional computing for enterprise-class server workloads.
Proceedings of the ACM/SPEC International Conference on Performance Engineering, 2013

Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Online Performance Projection for Clusters with Heterogeneous GPUs.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

On the Programmability and Performance of Heterogeneous Platforms.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments.
Proceedings of the IEEE 33rd International Conference on Distributed Computing Systems, 2013

Seamless Migration of Virtual Machines across Networks.
Proceedings of the 22nd International Conference on Computer Communication and Networks, 2013

Accelerating fast Fourier Transform for wideband channelization.
Proceedings of IEEE International Conference on Communications, 2013

On the efficacy of GPU-integrated MPI for scientific applications.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Consolidating Applications for Energy Efficiency in Heterogeneous Computing Systems.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Trends in energy-efficient computing: A perspective from the Green500.
Proceedings of the International Green Computing Conference, 2013

Cascaded TCP: Applying pipelining to TCP for efficient communication over wide-area networks.
Proceedings of the 2013 IEEE Global Communications Conference, 2013

Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
Parallel Mining of Neuronal Spike Streams on Graphics Processing Units.
Int. J. Parallel Program., 2012

Reliable MapReduce computing on opportunistic resources.
Clust. Comput., 2012

Multi-dimensional characterization of electrostatic surface potential computation on graphics processors.
BMC Bioinform., 2012

OpenCL and the 13 dwarfs: a work in progress.
Proceedings of the Third Joint WOSP/SIPEW International Conference on Performance Engineering, 2012

Automatic NUMA characterization using Cbench.
Proceedings of the Third Joint WOSP/SIPEW International Conference on Performance Engineering, 2012

Poster: Cascaded TCP: BIG Throughput for BIG DATA Applications in Distributed HPC.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Cascaded TCP: BIG Throughput for BIG DATA Applications in Distributed HPC.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Heterogeneous Task Scheduling for Accelerated OpenMP.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Efficient Intranode Communication in GPU-Accelerated Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Characterizing the Performance and Energy Efficiency of Simultaneous Multithreading in Multicore Environments.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

DMA-Assisted, Intranode Communication in GPU Accelerated Systems.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Transparent Accelerator Migration in a Virtualized GPU Environment.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
Homology to Sequence Alignment, From.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Coordinating Computation and I/O in Massively Parallel Sequence Search.
IEEE Trans. Parallel Distributed Syst., 2011

Poster: characterizing the impact of memory-access techniques on AMD fusion.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Accelerating Protein Sequence Search in a Heterogeneous Computing System.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Emerging Trends on the Evolving Green500: Year Three.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Optimizing Dynamic Programming on Graphics Processing Units via Adaptive Thread-Level Parallelism.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-Core Architectures.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

StreamMR: An Optimized MapReduce Framework for AMD GPUs.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

Architecture-Aware Mapping and Optimization on a 1600-Core GPU.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

AVS video decoder on multicore systems: Optimizations and tradeoffs.
Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, 2011

Restoring End-to-End Resilience in the Presence of Middleboxes.
Proceedings of 20th International Conference on Computer Communications and Networks, 2011

Towards accelerating molecular modeling via multi-scale approximation on a GPU.
Proceedings of the IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences, 2011

High-performance biocomputing for simulating the spread of contagion over large contact networks.
Proceedings of the IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences, 2011

Energy-efficient E-puting everywhere.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Performance Characterization and Optimization of Atomic Operations on AMD GPUs.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Bounding the effect of partition camping in GPU kernels.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010
Power saving experiments for large-scale global optimisation.
Int. J. Parallel Emergent Distributed Syst., 2010

A first look at integrated GPUs for green high-performance computing.
Comput. Sci. Res. Dev., 2010

Global-scale distributed I/O with ParaMEDIC.
Concurr. Comput. Pract. Exp., 2010

Missing genes in the annotation of prokaryotic genomes.
BMC Bioinform., 2010

Broadening accessibility to computer science for K-12 education.
Proceedings of the 15th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education, 2010

To GPU synchronize or not GPU synchronize?
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Inter-block GPU communication via fast barrier synchronization.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

The Green500 List: Year two.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Enhancing MapReduce via Asynchronous Data Processing.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

On the Goodput of TCP NewReno in Mobile Networks.
Proceedings of the 19th International Conference on Computer Communications and Networks, 2010

MOON: MapReduce On Opportunistic eNvironments.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Understanding Power Measurement Implications in the Green500 List.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

Statistical Power and Performance Modeling for Optimizing the Energy Efficiency of Scientific Computing.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

Power and Performance Characterization of Computational Kernels on the GPU.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors.
Proceedings of the 13th IEEE International Conference on Computational Science and Engineering, 2010

Towards chip-on-chip neuroscience: fast mining of neuronal spike streams using graphics hardware.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Towards Chip-on-Chip Neuroscience: Fast Mining of Frequent Episodes Using Graphics Processors
CoRR, 2009

Tools and Environments for Multicore and Many-Core Architectures.
Computer, 2009

On the energy efficiency of graphics processing units for scientific computing.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

The Green500 List: Year one.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Multi-dimensional characterization of temporal data mining on graphics processors.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

GePSeA: A General-Purpose Software Acceleration Framework for Lightweight Task Offloading.
Proceedings of the ICPP 2009, 2009

On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

2008
Algorithms for Integrated Routing and Scheduling for Aggregating Data from Distributed Resources on a Lambda Grid.
IEEE Trans. Parallel Distributed Syst., 2008

Green Supercomputing Comes of Age.
IT Prof., 2008

Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Massively parallel genomic sequence search on the Blue Gene/P architecture.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Semantics-based distributed I/O for mpiBLAST.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Impact of Network Sharing in Multi-Core Architectures.
Proceedings of the 17th International Conference on Computer Communications and Networks, 2008

Semantic-based distributed i/o with the paramedic framework.
Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 2008

Making a Case for Proactive Flow Control in Optical Circuit-Switched Networks.
Proceedings of the High Performance Computing, 2008

Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine.
Proceedings of the 5th Conference on Computing Frontiers, 2008

Achieving Edge-Based Fairness in a Multi-Hop Environment.
Proceedings of the 5th IEEE Consumer Communications and Networking Conference, 2008

Optimizing performance, cost, and sensitivity in pairwise sequence search on a cluster of PlayStations.
Proceedings of the 8th IEEE International Conference on Bioinformatics and Bioengineering, 2008

2007
High-performance computing using accelerators.
Parallel Comput., 2007

The Green500 List: Encouraging Sustainable Supercomputing.
Computer, 2007

Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Green Supercomputing in a Desktop Box.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Maintainable Software Architecture for Fast and Modular Bioinformatics Sequence Search.
Proceedings of the 23rd IEEE International Conference on Software Maintenance (ICSM 2007), 2007

CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multicore Environments.
Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007

Parallel genomic sequence-search on a massively parallel system.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
Bridging the Ethernet-Ethernot Performance Gap.
IEEE Micro, 2006

Grid applications - Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Grid networks and portals - End-system aware, rate-adaptive protocol for network transport in LambdaGrid environments.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Making a case for a Green500 list.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

RAPID: an end-system aware protocol for intelligent data transfer over lambda grids.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

When Optical Networking Meets Grid Computing?
Proceedings of the 15th International Conference On Computer Communications and Networks, 2006

Exploring I/O Strategies for Parallel Sequence-Search Tools with S3aSim.
Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, 2006

A Feedback Mechanism for Network Scheduling in LambdaGrids.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
FAST TCP: from theory to experiments.
IEEE Netw., 2005

Anatomy of UDP and M-VIA for cluster communication.
J. Parallel Distributed Comput., 2005

Analyzing MPI performance over 10-Gigabit ethernet.
J. Parallel Distributed Comput., 2005

A Power-Aware Run-Time System for High-Performance Computing.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Towards Efficient Supercomputing: A Quest for the Right Metric.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Q-Composer and CpR: a probabilistic synthesizer and regulator of traffic (a probabilistic control of buffer occupancy).
Proceedings of the INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies, 2005

Performance Characterization of a 10-Gigabit Ethernet TOE.
Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

A Feasibility Analysis of Power Awareness in Commodity-Based High-Performance Clusters.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

2004
End-to-End Performance of 10-Gigabit Ethernet on Commodity Systems.
IEEE Micro, 2004

User-space auto-tuning for TCP flow control in computational grids.
Comput. Commun., 2004

Effective Dynamic Voltage Scaling Through CPU-Boundedness Detection.
Proceedings of the Power-Aware Computer Systems, 4th International Workshop, 2004

Re-Architecting Flow Control Adaptation for Grid Environments.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

A Systematic Approach for Providing End-to-End Probabilistic QoS Guarantees.
Proceedings of the International Conference On Computer Communications and Networks (ICCCN 2004), 2004

A Multimodal Interface for the Immediate Transcription of Radiology Dictation.
Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems (CBMS 2004), 2004

2003
Making a Case for Efficient Supercomputing.
ACM Queue, 2003

Scheduling and Transport for File Transfers on High-Speed Optical Circuits.
J. Grid Comput., 2003

Automatic Flow-Control Adaptation for Enhancing Network Performance in Computational Grids.
J. Grid Comput., 2003

Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Enabling Compatibility Between TCP Reno and TCP Vegas.
Proceedings of the 2003 Symposium on Applications and the Internet (SAINT 2003), 27-31 January 2003, 2003

Green Destiny + mpiBLAST = Bioinfomagic.
Proceedings of the Parallel Computing: Software Technology, 2003

MUSE: A Software Oscilloscope for Clusters and Grids.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Optimizing GridFTP through Dynamic Right-Sizing.
Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

Initial end-to-end performance evaluation of 10-Gigabit Ethernet.
Proceedings of the 11th Annual IEEE Symposium on High Performance Interconnects, 2003

An Integrated Multimedia Environment for Speech Recognition Using Handwriting and Written Gestures.
Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS-36 2003), 2003

MAGNET: A Tool for Debugging, Analyzing and Adapting Computing Systems.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002
Packet Spacing: An Enabling Mechanism for Delivering Multimedia Content in Computational Grids.
J. Supercomput., 2002

The MAGNeT Toolkit: Design, Implementation and Evaluation.
J. Supercomput., 2002

The Quadrics Network: High-Performance Clustering Technology.
IEEE Micro, 2002

High-density computing: a 240-processor Beowulf in one cubic meter.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Dynamic Right-Sizing: An Automated, Lightweight, and Scalable Technique for Enhancing Grid Performance.
Proceedings of the Protocols for High Speed Networks, 2002

Honey, I Shrunk the Beowulf!
Proceedings of the 31st International Conference on Parallel Processing (ICPP 2002), 2002

On the transient behavior of TCP Vegas.
Proceedings of the 11th International Conference on Computer Communications and Networks, 2002

Using Steady- State TCP Behavior for Proactive Queue Management.
Proceedings of the International Conference on Internet Computing, 2002

A Comparison of TCP Automatic Tuning Techniques for Distributed Computing.
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002

Dynamic Right-Sizing in FTP (drsFTP): Enhancing Grid Performance in User-Space.
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002

GREEN: proactive queue management over a best-effort network.
Proceedings of the Global Telecommunications Conference, 2002

The Bladed Beowulf: A Cost-Effective Alternative to Traditional Beowulf.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

2001
Improved resource utilization with buffered coscheduling.
Parallel Algorithms Appl., 2001

Performance Evaluation of the Quadrics Interconnection Network.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Gang Scheduling with Lightweight User-Level Communication.
Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001

The Effects of Inter-packet Spacing on the Delivery of Multimedia Content.
Proceedings of the 21st International Conference on Distributed Computing Systems (ICDCS 2001), 2001

Dynamic right-sizing: a simulation study.
Proceedings of the 10th International Conference on Computer Communications and Networks, 2001

MAGNeT: monitor for application-generated network traffic.
Proceedings of the 10th International Conference on Computer Communications and Networks, 2001

A Case for TCP Vegas in High-Performance Computational Grids.
Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10 2001), 2001

The Quadrics network (QsNet): high-performance clustering technology.
Proceedings of the Ninth Symposium on High Performance Interconnects, 2001

Capturing Network Traffic with a MAGNeT.
Proceedings of the 5th Annual Linux Showcase & Conference 2001, 2001

2000
The Failure of TCP in High-Performance Computational Grids.
Proceedings of the Proceedings Supercomputing 2000, 2000

Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements.
Proceedings of the Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, 2000

Buffered Coscheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

The Adverse Impact of the TCP Congestion-Control Mechanism in Heterogeneous Computing Systems.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

On the Burstiness of the TCP Congestion-Control Mechanism in a Distributed Computing System.
Proceedings of the 20th International Conference on Distributed Computing Systems, 2000

Scheduling with Global Information in Distributed Systems.
Proceedings of the 20th International Conference on Distributed Computing Systems, 2000

1999
The Design of an Open Real-Time System Using CORBA.
Proceedings of the 1999 International Conference on Parallel Processing Workshops, 1999

Dynamic Client-Side Scheduling in a Real-Time CORBA System.
Proceedings of the 23rd International Computer Software and Applications Conference (COMPSAC '99), 1999

1997
Algorithms for Scheduling Real-Time Tasks with Input Error and End-to-End Deadlines.
IEEE Trans. Software Eng., 1997

1989
Map Data Processing in Geographic Information Systems.
Computer, 1989


  Loading...