Wei-Chung Hsu

Orcid: 0000-0002-0833-7981

According to our database1, Wei-Chung Hsu authored at least 110 papers between 1986 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
An Automatic Facial Analysis System for The Detection of Pediatric Obstructive Sleep Apnea.
Proceedings of the International Conference on Consumer Electronics - Taiwan, 2023

MultiFuse: Efficient Cross Layer Fusion for DNN Accelerators with Multi-level Memory Hierarchy.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

2022
Accelerating Video Captioning on Heterogeneous System Architectures.
ACM Trans. Archit. Code Optim., 2022

Accelerating Convolutional Neural Networks via Inter-operator Scheduling.
Proceedings of the 28th IEEE International Conference on Parallel and Distributed Systems, 2022

2021
Efficient Video Captioning on Heterogeneous System Architectures.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Intra- ­and Inter- Layer Transformation to Reduce Memory Traffic for CNN Computation.
Proceedings of the ICPP Workshops 2021: 50th International Conference on Parallel Processing, 2021

2019
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation.
ACM Trans. Archit. Code Optim., 2019

Processor-Tracing Guided Region Formation in Dynamic Binary Translation.
ACM Trans. Archit. Code Optim., 2019

Optimizing data permutations in structured loads/stores translation and SIMD register mapping for a cross-ISA dynamic binary translator.
J. Syst. Archit., 2019

Efficient Dynamic Device Placement for Deep Neural Network Training on Heterogeneous Systems.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2019

Exploiting Vector Processing in Dynamic Binary Translation.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Translating Traditional SIMD Instructions to Vector Length Agnostic Architectures.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
Exploring hidden coherency of Ray-Tracing for heterogeneous systems using online feedback methodology.
Vis. Comput., 2018

Improving SIMD Parallelism via Dynamic Binary Translation.
ACM Trans. Embed. Comput. Syst., 2018

Efficient and retargetable SIMD translation in a dynamic binary translator.
Softw. Pract. Exp., 2018

Efficient synthetic light field generation using adaptive multi-level rendering.
Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, 2018

Dynamic tuning of applications using restricted transactional memory.
Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, 2018

Automatically Migrating Sequential Applications to Heterogeneous System Architecture.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Exploiting SIMD capability in an ARMv7-to-ARMv8 dynamic binary translator.
Proceedings of the International Conference on Compilers, 2018

2017
A Pipeline-Based Ray-Tracing Runtime System for HSA-Compliant Frameworks.
IEEE Trans. Multim., 2017

On Static Binary Translation of ARM/Thumb Mixed ISA Binaries.
ACM Trans. Embed. Comput. Syst., 2017

ReRanz: A Light-Weight Virtual Machine to Mitigate Memory Disclosure Attacks.
Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2017

Adaptive runtime exploiting sparsity in tensor of deep learning neural network on heterogeneous systems.
Proceedings of the 2017 International Conference on Embedded Computer Systems: Architectures, 2017

Efficient Synthetic Light Field Rendering on Heterogeneous Systems Using a Pipeline-Based Runtime Design.
Proceedings of the International Conference on Research in Adaptive and Convergent Systems, 2017

Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions.
Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, 2017

Exploiting Asymmetric SIMD Register Configurations in ARM-to-x86 Dynamic Binary Translation.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Optimizing Control Transfer and Memory Virtualization in Full System Emulators.
ACM Trans. Archit. Code Optim., 2016

Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compliant System.
Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2016

HSAemu 2.0: Full System Emulation for HSA platforms with Soft-MMU.
Proceedings of the International Conference on Research in Adaptive and Convergent Systems, 2016

Exploiting Longer SIMD Lanes in Dynamic Binary Translation.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

2015
An Adaptive Heterogeneous Runtime Framework for Irregular Applications.
J. Signal Process. Syst., 2015

A dynamic binary translation system in a client/server environment.
J. Syst. Archit., 2015

Automatic validation for binary translation.
Comput. Lang. Syst. Struct., 2015

HSPT: Practical Implementation and Efficient Management of Embedded Shadow Page Tables for Cross-ISA System Virtual Machines.
Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2015

SIMD Code Translation in an Enhanced HQEMU.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Runtime techniques for efficient Ray-Tracing on heterogeneous systems.
Proceedings of the 2015 IEEE International Conference on Digital Signal Processing, 2015

Improving SIMD code generation in QEMU.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2014
Efficient and Retargetable Dynamic Binary Translation on Multicores.
IEEE Trans. Parallel Distributed Syst., 2014

Extended Instruction Exploration for Multiple-Issue Architectures.
ACM Trans. Embed. Comput. Syst., 2014

A Retargetable Static Binary Translator for the ARM Architecture.
ACM Trans. Archit. Code Optim., 2014

DBILL: an efficient and retargetable dynamic binary instrumentation framework using llvm backend.
Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2014

Efficient memory virtualization for Cross-ISA system mode emulation.
Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2014

An Adaptive Heterogeneous Runtime for Irregular Applications in the Case of Ray-Tracing (Extended Abstract).
Proceedings of the Network and Parallel Computing, 2014

Author retrospective for code scheduling and register allocation in large basic blocks.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

HSAemu - A full system emulator for HSA platforms.
Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, 2014

Dynamic and Adaptive Calling Context Encoding.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013
The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution.
ACM Trans. Archit. Code Optim., 2013

Improving dynamic binary optimization through early-exit guided code region formation.
Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (co-located with ASPLOS 2013), 2013

Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator.
Proceedings of the International Conference on Compilers, 2013

2012
Effectiveness of Compiler-Directed Prefetching on Data Mining Benchmarks.
J. Circuits Syst. Comput., 2012

Design of communication interface and control system for intelligent humanoid robot.
Comput. Appl. Eng. Educ., 2012

An LLVM-based hybrid binary translation system.
Proceedings of the 7th IEEE International Symposium on Industrial Embedded Systems, 2012

HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

LLBT: an LLVM-based static binary translator.
Proceedings of the 15th International Conference on Compilers, 2012

A hybrid just-in-time compiler for android: comparing JIT types and the result of cooperation.
Proceedings of the 15th International Conference on Compilers, 2012

2011
Efficient and effective misaligned data access handling in a dynamic binary translation system.
ACM Trans. Archit. Code Optim., 2011

LnQ: Building High Performance Dynamic Binary Translators with Existing Compiler Backends.
Proceedings of the International Conference on Parallel Processing, 2011

PQEMU: A Parallel System Emulator Based on QEMU.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

Dynamic register promotion of stack variables.
Proceedings of the CGO 2011, 2011

A method-based ahead-of-time compiler for android applications.
Proceedings of the 14th International Conference on Compilers, 2011

2010
Local-loop based robot action control module using independent microprocessors.
Comput. Appl. Eng. Educ., 2010

Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systems.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Exploring speculative parallelism in SPEC2006.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Dynamic performance tuning for speculative threads.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Reducing Code Size by Graph Coloring Register Allocation and Assignment Algorithm for Mixed-Width ISA Processor.
Proceedings of the 12th IEEE International Conference on Computational Science and Engineering, 2009

An Evaluation of Misaligned Data Access Handling Mechanisms in Dynamic Binary Translation Systems.
Proceedings of the CGO 2009, 2009

2008
Sufficient sunlight supply for home care using local closed-loop shutter control system.
Proceedings of the IEEE International Conference on Systems, 2008

08441 Final Report - Emerging Uses and Paradigms for Dynamic Binary Translation.
Proceedings of the Emerging Uses and Paradigms for Dynamic Binary Translation, 26.10., 2008

2007
CIM: A Reliable Metric for Evaluating Program Phase Classifications.
IEEE Comput. Archit. Lett., 2007

Analysis of Statistical Sampling in Microarchitecture Simulation: Metric, Methodology and Program Characterization.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

An Architecture for the Interoperability of Multimedia Messaging Services between GPRS and PHS Cellular Networks.
Proceedings of the 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007), 2007

COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Entropy-Based Profile Characterization and Classification for Automatic Profile Management.
Proceedings of the Advances in Computer Systems Architecture, 2007

2006
Recovery code generation for general speculative optimizations.
ACM Trans. Archit. Code Optim., 2006

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors.
Proceedings of the High Performance Computing, 2006

Region Monitoring for Local Phase Detection in Dynamic Optimization Systems.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

A Study of the Performance Potential for Dynamic Instruction Hints Selection.
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006

Issues and Support for Dynamic Register Allocation.
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006

2005
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Dynamic Code Region (DCR) Based Program Phase Tracking and Prediction for Dynamic Optimizations.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Performance of Runtime Optimization on BLAST.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

2004
A compiler framework for speculative optimizations.
ACM Trans. Archit. Code Optim., 2004

Design and Implementation of a Lightweight Dynamic Optimization System.
J. Instr. Level Parallelism, 2004

Data Dependence Profiling for Speculative Optimizations.
Proceedings of the Compiler Construction, 13th International Conference, 2004

Continuous Adaptive Object-Code Re-optimization Framework.
Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004

A Compiler Framework for Recovery Code Generation in General Speculative Optimizations.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003
A compiler framework for speculative analysis and optimizations.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Speculative Register Promotion Using Advanced Load Address Table (ALAT).
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

Dynamic Trace Selection Using Performance Monitoring Hardware Sampling.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
An Empirical Study on the Granularity of Pointer Analysis in C Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

On the Impact of Naming Methods for Heap-Oriented Pointers in C Programs.
Proceedings of the International Symposium on Parallel Architectures, 2002

On the Predictability of Program Behavior Using Different Input Data Sets.
Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-6 2002), 2002

1998
A Performance Study of Instruction Cache Prefetching Methods.
IEEE Trans. Computers, 1998

1997
Data Prefetching on the HP PA-8000.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

1996
Instruction Scheduling for the HP PA-8000.
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

1993
Toward Effective Scalar Hardware for Highly Vectorizable Applications.
J. Parallel Distributed Comput., 1993

Performance of Cached DRAM Organizations in Vector Supercomputers.
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

1992
Prefetching in Supercomputer Instruction Caches.
Proceedings of the Proceedings Supercomputing '92, 1992

On the instruction-level characteristics of scalar code in highly-vectorized scientific applications.
Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992

1991
An Empirical Study of the CRAY Y-MP Processor Using the Perfect Club Benchmarks.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

1990
The use of intermediate memories for low-latency memory access in supercomputer scalar units.
J. Supercomput., 1990

Future general purpose supercomputer architectures.
Proceedings of the Proceedings Supercomputing '90, New York, NY, USA, November 12-16, 1990, 1990

Exploitation of operation-level parallelism in a processor of the CRAY X-MP.
Proceedings of the 1990 IEEE International Conference on Computer Design: VLSI in Computers and Processors, 1990

1989
On the Minimization of Loads/Stores in Local Register Allocation.
IEEE Trans. Software Eng., 1989

1988
Code scheduling and register allocation in large basic blocks.
Proceedings of the 2nd international conference on Supercomputing, 1988

1987
WISQ: A Restartable Architecture Using Queues.
Proceedings of the 14th Annual International Symposium on Computer Architecture. Pittsburgh, 1987

1986
On the Use of Registers vs. Cache to Minimize Memory Traffic.
Proceedings of the 13th Annual Symposium on Computer Architecture, Tokyo, Japan, June 1986, 1986


  Loading...