Chengyong Wu

According to our database1, Chengyong Wu authored at least 33 papers between 2003 and 2017.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2017
Two-Level Task Scheduling for Irregular Applications on GPU Platform.
Int. J. Parallel Program., 2017

2016
Rethinking Memory Management in Modern Operating System: Horizontal, Vertical or Random?
IEEE Trans. Computers, 2016

Pragma Directed Shared Memory Centric Optimizations on GPUs.
J. Comput. Sci. Technol., 2016

2015
A Small-Footprint Accelerator for Large-Scale Neural Networks.
ACM Trans. Comput. Syst., 2015

Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Practical Iterative Optimization for the Data Center.
ACM Trans. Archit. Code Optim., 2015

A High-Throughput Neural Network Accelerator.
IEEE Micro, 2015

Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Retraining-based timing error mitigation for hardware neural networks.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2014
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems.
ACM Trans. Archit. Code Optim., 2014

Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach.
ACM Trans. Archit. Code Optim., 2014

Finding representative sets of optimizations for adaptive multiversioning applications.
CoRR, 2014

Going vertical in memory management: Handling multiplicity by multi-policy.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

A low-cost memory interface for high-throughput accelerators.
Proceedings of the 2014 International Conference on Compilers, 2014

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

2013
Elastic CGRAs.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

2012
Deconstructing iterative optimization.
ACM Trans. Archit. Code Optim., 2012

SWAP: Parallelization through Algorithm Substitution.
IEEE Micro, 2012

Iterative optimization for the data center.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

A software memory partition approach for eliminating bank-level interference in multicore systems.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

An Improved Clipping Scheme Based on TR for PAPR Reduction in OFDM Systems.
Proceedings of the 12th IEEE International Conference on Computer and Information Technology, 2012

Novel Iterative Channel Estimation Method for DTMB System.
Proceedings of the 12th IEEE International Conference on Computer and Information Technology, 2012

2011
How sensitive is processor customization to the workload's input datasets?
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

2010
Evaluating iterative optimization across 1000 datasets.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

2009
Modeling Nondeterministic Feature with Petri Net for Network Protocol in Interoperability Testing.
Proceedings of the CSIE 2009, 2009 WRI World Congress on Computer Science and Information Engineering, March 31, 2009

2008
Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems.
Proceedings of the Euro-Par 2008, 2008

Diva: A dataflow programming model and its runtime support in Java virtual machine.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

2005
Optimizing Packet Accesses for a Domain Specific Language on Network Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

A Register Allocation Framework for Banked Register Files with Access Constraints.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

2004
Efficient Modeling of Itanium Architecture during Instruction Scheduling using Extended Finite State Automata.
J. Instr. Level Parallelism, 2004

An Overview of the Open Research Compiler.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

2003
Efficient Resource Management during Instruction Scheduling for the EPIC Architecture.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003


  Loading...