Shih-Hao Hung

Orcid: 0000-0003-2043-2663

According to our database1, Shih-Hao Hung authored at least 94 papers between 1994 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Toward cost-effective quantum circuit simulation with performance tuning techniques.
Connect. Sci., December, 2024

QOPS: A Compiler Framework for Quantum Circuit Simulation Acceleration with Profile Guided Optimizations.
CoRR, 2024

Queen: A quick, scalable, and comprehensive quantum circuit simulation for supercomputing.
CoRR, 2024

Towards Optimizations of Quantum Circuit Simulation for Solving Max-Cut Problems with QAOA.
Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, 2024

Maximizing QAOA Potential: Efficient Max-Cut Solutions through Classical Parallel Searching for Time-Sensitive Applications.
Proceedings of the International Conference on Consumer Electronics - Taiwan, 2024

Oil and Vinegar: Modern Parameters and Implementations.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2023

Self-Supervised Multi-LiDAR Object View Generation Using Single LiDAR.
Proceedings of the 29th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2023

Towards Scalable Quantum Circuit Simulation via RDMA.
Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems, 2023

A Heterogeneous Computing Framework for Accelerating Fully Homomorphic Encryption.
Proceedings of the Mobile Internet Security - 7th International Conference, 2023

A Profiling Guided Optimization Scheme for Accelerating Quantum Simulations.
Proceedings of the International Conference on Consumer Electronics - Taiwan, 2023

An Efficient CKKS-FHEW/TFHE Hybrid Encrypted Inference Framework.
Proceedings of the Computer Security. ESORICS 2023 International Workshops, 2023

Accelerating Simulated Quantum Annealing with GPU and Tensor Cores.
Proceedings of the High Performance Computing - 37th International Conference, 2022

cuPSO: GPU parallelization for particle swarm optimization algorithms.
Proceedings of the SAC '22: The 37th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, April 25, 2022

Performance Acceleration of Secure Machine Learning Computations for Edge Applications.
Proceedings of the 28th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2022

TensorHE: a homomorphic encryption transformer for privacy-preserving deep learning.
Proceedings of the Conference on Research in Adaptive and Convergent Systems, 2022

hKVS: a framework for designing a high throughput heterogeneous key-value store with SmartNIC and RDMA.
Proceedings of the Conference on Research in Adaptive and Convergent Systems, 2022

FEZ: Flexible and Efficient Zoom-In for Ultra-Large Image Classification.
Proceedings of the IEEE International Conference on Big Data, 2022

End-to-End Performance Optimization for Training Streaming Convolutional Neural Networks using Billion-Pixel Whole-Slide Images.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

ResPerfNet: Deep Residual Learning for Regressional Performance Modeling of Deep Neural Networks.
CoRR, 2020

Toward Accurate Platform-Aware Performance Modeling for Deep Neural Networks.
CoRR, 2020

Accelerating Variant Calling with Parallelized DeepVariant.
Proceedings of the RACS '20: International Conference on Research in Adaptive and Convergent Systems, 2020

PerfNet: Platform-Aware Performance Modeling for Deep Neural Networks.
Proceedings of the RACS '20: International Conference on Research in Adaptive and Convergent Systems, 2020

Performance Evaluation of a GPU-based Monte Carlo Simulation Package for Water Radiolysis with sub-MeV Electrons.
Proceedings of the RACS '20: International Conference on Research in Adaptive and Convergent Systems, 2020

Toward Fast Platform-Aware Neural Architecture Search for FPGA-Accelerated Edge AI Applications.
Proceedings of the RACS '20: International Conference on Research in Adaptive and Convergent Systems, 2020

Performance Analysis and Optimization for Federated Learning Applications with PySyft-based Secure Aggregation.
Proceedings of the International Computer Symposium, 2020

PerfNetRT: Platform-Aware Performance Modeling for Optimized Deep Neural Networks.
Proceedings of the International Computer Symposium, 2020

Rapid Hybrid Simulation Methods for Exploring the Design Space of Signal Processors with Dynamic and Scalable Timing Models.
J. Signal Process. Syst., 2019

More Exploration to Composable Infrastructure: The Application and Analysis of Composable Memory.
Proceedings of the 2019 Spring Simulation Conference, 2019

Modeling Interprocessor Communication and Performance Scalability for Distributed Deep Learning Systems.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Phase-Based Profiling and Performance Prediction with Timing Approximate Simulators.
Proceedings of the 24th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2018

Acceleration of Monte-Carlo simulation on high performance computing platforms.
Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, 2018

Hardware-accelerated cache simulation for multicore by FPGA.
Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, 2018

A Fast and Scalable Cluster Simulator for Network Performance Projection of HPC Applications.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Fast profiling framework and race detection for heterogeneous system.
J. Syst. Archit., 2017

GPU acceleration for Kernel Samepage Merging.
Proceedings of the 23rd IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2017

Program Analysis with a Loop-Function-based Tracing Tool on Virtual Platforms.
Proceedings of the International Conference on Research in Adaptive and Convergent Systems, 2017

Availability Is Not Enough: Minimizing Joint Response Time in Peer-Assisted Cloud Storage Systems.
IEEE Syst. J., 2016

Flattened Data in Convolutional Neural Networks: Using Malware Detection as Case Study.
Proceedings of the International Conference on Research in Adaptive and Convergent Systems, 2016

HSAemu 2.0: Full System Emulation for HSA platforms with Soft-MMU.
Proceedings of the International Conference on Research in Adaptive and Convergent Systems, 2016

Virtual Hadoop: MapReduce over Docker Containers with an Auto-Scaling Mechanism for Heterogeneous Environments.
Proceedings of the International Conference on Research in Adaptive and Convergent Systems, 2016

A Platform-Oblivious Approach for Heterogeneous Computing: A Case Study with Monte Carlo-based Simulation for Medical Applications.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

A code offloading scheme for big-data processing in android applications.
Softw. Pract. Exp., 2015

Real-time and intelligent private data protection for the Android platform.
Pervasive Mob. Comput., 2015

Migratom.js: a JavaScript migration framework for distributed web computing and mobile devices.
Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

Load balancing for hybrid NoSQL database management systems.
Proceedings of the 2015 Conference on research in adaptive and convergent systems, 2015

Rapid analysis of interprocessor communications on heterogeneous system architectures via parallel cache emulation.
Proceedings of the 2015 Conference on research in adaptive and convergent systems, 2015

Message-Passing Programming for Embedded Multicore Signal-Processing Platforms.
J. Signal Process. Syst., 2014

Performance and power profiling for emulated Android systems.
ACM Trans. Design Autom. Electr. Syst., 2014

MobileFBP: Designing portable reconfigurable applications for heterogeneous systems.
J. Syst. Archit., 2014

A framework of cloud-based virtual phones for secure intelligent information management.
Int. J. Inf. Manag., 2014

Developing a problem-solving learning system to assess the effects of different materials on learning performance and attitudes.
Comput. Educ., 2014

The acceleration of pipeline workloads under the FPGA area and bandwidth constraints.
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014

DroidDolphin: a dynamic Android malware detection framework using big data and machine learning.
Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, 2014

Exploring the Design Space for Android Smartphones.
Proceedings of the Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 2014

PasDroid: Real-Time Security Enhancement for Android.
Proceedings of the Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 2014

HSAemu - A full system emulator for HSA platforms.
Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis, 2014

Hardware acceleration for proton beam Monte Carlo simulation.
Proceedings of the Research in Adaptive and Convergent Systems, 2013

Performance and Power Estimation for Mobile-Cloud Applications on Virtualized Platforms.
Proceedings of the Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 2013

Creating Pervasive, Dynamic, Scalable Android Applications.
Proceedings of the Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 2013

MCEmu: A Framework for Software Development and Performance Analysis of Multicore Systems.
ACM Trans. Design Autom. Electr. Syst., 2012

An adaptive file-system-oriented FTL mechanism for flash-memory storage systems.
ACM Trans. Embed. Comput. Syst., 2012

Executing mobile applications on the cloud: Framework and issues.
Comput. Math. Appl., 2012

Performance evaluation of machine-to-machine (M2M) systems with virtual machines.
Proceedings of the 15th International Symposium on Wireless Personal Multimedia Communications, 2012

A VM-aware fairness scheduler on heterogenous multi-core platforms.
Proceedings of the Research in Applied Computation Symposium, 2012

On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A real-time, energy-efficient system software suite for heterogeneous multicore platforms.
Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, 2012

System-wide profiling and optimization with virtual machines.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

A portable, efficient inter-core communication scheme for embedded multicore platforms.
J. Syst. Archit., 2011

Developing Collaborative Applications with Mobile Cloud - A Case Study of Speech Recognition.
J. Internet Serv. Inf. Secur., 2011

Data Transmission with the Battery Utilization Maximization.
J. Comput. Sci. Technol., 2011

Migrating Android Applications to the Cloud.
Int. J. Grid High Perform. Comput., 2011

Virtualizing Smartphone Applications to the Cloud.
Comput. Informatics, 2011

Building a scalable and portable message-passing library for embedded multicore systems.
Proceedings of the Research in Applied Computation Symposium, 2011

User behavior augmented software testing for user-centered GUI.
Proceedings of the Research in Applied Computation Symposium, 2011

An Online Migration Environment for Executing Mobile Applications on the Cloud.
Proceedings of the Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 2011

CSR: A Cloud-Assisted Speech Recognition Service for Personal Mobile Device.
Proceedings of the International Conference on Parallel Processing, 2011

Pipeline schedule synthesis for real-time streaming tasks with inter/intra-instance precedence constraints.
Proceedings of the Design, Automation and Test in Europe, 2011

Task Scheduling for Context Minimization in Dynamically Reconfigurable Platforms.
J. Signal Process. Syst., 2010

Energy-efficient real-time scheduling of multimedia tasks on multi-core processors.
Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), 2010

Designing and Implementing a Portable, Efficient Inter-core Communication Scheme for Embedded Multicore Platforms.
Proceedings of the 16th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2010

V2X: An Automated Tool for Building SystemC-Based Simulation Environments in Designing Multicore Systems-on-Chips.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2010

Integer Number Crunching on the Cell Processor.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Trace-based performance analysis framework for heterogeneous multicore systems.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

A Virtual Timing Device for Program Performance Analysis.
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

Zero-Buffer Inter-core Process Communication Protocol for Heterogeneous Multi-core Platforms.
Proceedings of the 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2009

An Automatic Compiler Optimizations Selection Framework for Embedded Applications.
Proceedings of the International Conference on Embedded Software and Systems, 2009

New Tracing and Performance Analysis Techniques for Embedded Applications.
Proceedings of the Fourteenth IEEE Internationl Conference on Embedded and Real-Time Computing Systems and Applications, 2008

Optimizing the Embedded Caching and Prefetching Software on a Network-Attached Storage System.
Proceedings of the 2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), 2008

Task Scheduling for Context Minimization in Dynamically Reconfigurable Platforms.
Proceedings of the Embedded and Ubiquitous Computing, International Conference, 2007

Scalable Lossless High Definition Image Coding on Multicore Platforms.
Proceedings of the Embedded and Ubiquitous Computing, International Conference, 2007

A Low Complexity Rate-Distortion Source Modeling Framework.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Cache Leakage Management for Multi-programming Workloads.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

Optimizing parallel applications.
PhD thesis, 1998

A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1.
Proceedings of the 1994 International Conference on Parallel Processing, 1994
