Mei Wen
Orcid: 0000-0002-5875-3297
According to our database1,
Mei Wen
authored at least 110 papers
between 2004 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
IEEE Trans. Very Large Scale Integr. Syst., September, 2024
Enhancing the PE Utilization for Multi-Precision Systolic Array via Optimizing Computation Latency.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024
BitShare: An Efficient Precision-Scalable Accelerator with Combining-Like-Terms GEMM.
Proceedings of the 35th IEEE International Conference on Application-specific Systems, 2024
2023
Proceedings of the 41st IEEE International Conference on Computer Design, 2023
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
2022
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022
S-SIM: A Simulator for Systolic Array-based DNN Accelerators with Tile Access Awareness.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022
Proceedings of the 51st International Conference on Parallel Processing, 2022
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
Light: A Component Enhances Faster and More Accurate Traffic Measurement<sup>*</sup>.
Proceedings of the IEEE International Conference on Communications, 2022
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022
Exploring ILP for VLIW Architecture by Quantified Modeling and Dynamic Programming-Based Instruction Scheduling.
Proceedings of the 27th Asia and South Pacific Design Automation Conference, 2022
2021
Sustaining Consumer Trust and Continuance Intention by Institutional Mechanisms: An Empirical Survey of DiDi in China.
IEEE Access, 2021
Automatic mapping and code optimization for OpenCL kernels on FT-matrix architecture (WIP paper).
Proceedings of the LCTES '21: 22nd ACM SIGPLAN/SIGBED International Conference on Languages, 2021
sRouting: Towards a Better Flow Size Estimation Performance through Routing and Sketch Configuration.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021
SAI: Self-Adjusting Incremental Quantile Estimation for Sparse Training of Neural Networks on Hardware Accelerators.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021
Embrace the Conflicts: Exploring the Integration of Single Port Memory in Systolic Array-based Accelerators.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021
2020
Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters.
IEEE Trans. Parallel Distributed Syst., 2020
Toward an Efficient Deep Pipelined Template-Based Architecture for Accelerating the Entire 2-D and 3-D CNNs on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
KSII Trans. Internet Inf. Syst., 2020
IEEE Access, 2020
Incremental Deployment of Programmable Switches for Sketch-based Network Measurement.
Proceedings of the IEEE Symposium on Computers and Communications, 2020
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020
Proceedings of the 2020 IEEE International Conference on Communications, 2020
Proceedings of the Algorithms and Architectures for Parallel Processing, 2020
Towards a Deep-Pipelined Architecture for Accelerating Deep GCN on a Multi-FPGA Platform.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2020
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020
Towards Memory-Efficient Streaming Processing with Counter-Cascading Sketching on FPGA.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
2019
Interleaved Sketch: Toward Consistent Network Telemetry for Commodity Programmable Switches.
IEEE Access, 2019
Proceedings of the 2019 IEEE Symposium on Computers and Communications, 2019
Towards a Uniform Architecture for the Efficient Implementation of 2D and 3D Deconvolutional Neural Networks on FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019
Poster Abstract: Deep Learning Workloads Scheduling with Reinforcement Learning on GPU Clusters.
Proceedings of the IEEE INFOCOM 2019, 2019
Poster Abstract: A Template-based Framework for Generating Network Processor in FPGA.
Proceedings of the IEEE INFOCOM 2019, 2019
An Efficient Design Flow for Accelerating Complicated-connected CNNs on a Multi-FPGA Platform.
Proceedings of the 48th International Conference on Parallel Processing, 2019
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019
SACC: Configuring Application-Level Cache Intelligently for In-Memory Database Based on Long Short-Term Memory.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019
Proceedings of the 3rd International Conference on High Performance Compilation, 2019
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019
Scale-out Acceleration for 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
2018
Sci. Program., 2018
IEICE Electron. Express, 2018
Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018
Multiple CNN-based Tasks Scheduling across Shared GPU Platform in Research and Development Scenarios.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018
Proceedings of the 2nd International Conference on High Performance Compilation, 2018
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018
2017
Sci. Program., 2017
Frontiers Inf. Technol. Electron. Eng., 2017
Applying Detection Proposals to Visual Tracking for Scale and Aspect Ratio Adaptability.
Int. J. Comput. Vis., 2017
FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency.
Concurr. Comput. Pract. Exp., 2017
Comput. Intell. Neurosci., 2017
Proceedings of the Network and Parallel Computing, 2017
Proceedings of the 28th IEEE International Conference on Application-specific Systems, 2017
2016
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016
2015
An analytical GPU performance model for 3D stencil computations from the angle of data traffic.
J. Supercomput., 2015
Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations.
Frontiers Inf. Technol. Electron. Eng., 2015
Int. J. High Perform. Comput. Appl., 2015
IEICE Trans. Inf. Syst., 2015
Concurr. Comput. Pract. Exp., 2015
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015
Enable Scale and Aspect Ratio Adaptability in Visual Tracking with Detection Proposals.
Proceedings of the British Machine Vision Conference 2015, 2015
2014
Clust. Comput., 2014
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014
Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014
2013
Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration.
J. Supercomput., 2013
Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference.
J. Supercomput., 2013
IEICE Trans. Inf. Syst., 2013
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013
Proceedings of the International Conference on Computational Science, 2013
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013
Automatic Mapping Single-Device OpenCL Program to Heterogeneous Multi-device Platform.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013
Proceedings of the Advanced Parallel Processing Technologies, 2013
2012
Proceedings of the 13th International Conference on Parallel and Distributed Computing, 2012
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012
The masala machine: accelerating thread-intensive and explicit memory management programs with dynamically reconfigurable FPGAs (abstract only).
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012
2011
Trans. High Perform. Embed. Archit. Compil., 2011
Int. J. Inf. Technol. Decis. Mak., 2011
Proceedings of the 34th International Conference on Telecommunications and Signal Processing (TSP 2011), 2011
High-efficient software parallel CAVLC encoder based on programmable stream processor.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011
Proceedings of the Sixth International Conference on Image and Graphics, 2011
2010
A Parallel Streaming Motion Estimation for Real-Time HD H.264 Encoding on Programmable Processors.
Proceedings of the Fifth International Conference on Frontier of Computer Science and Technology, 2010
Software Managed Instruction Scratchpad Memory Optimization in Stream Architecture Based on Hot Code Analysis of Kernels.
Proceedings of the 13th Euromicro Conference on Digital System Design, 2010
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010
2009
Proceedings of the 17th International Conference on Multimedia 2009, 2009
Proceedings of the 16th International Conference on High Performance Computing, 2009
Proceedings of the 7th IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2009
2008
On-Chip Memory System Optimization Design for the FT64 Scientific Stream Accelerator.
IEEE Micro, 2008
Proceedings of the 13th Asia South Pacific Design Automation Conference, 2008
FPGA-based Equivalent Simulation Technology (FEST) for clustered stream architecture.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008
2007
Proceedings of the High Performance Computing, 2007
A Stream System-on-Chip Architecture for High Speed Target Recognition Based on Biologic Vision.
Proceedings of the Advances in Computer Systems Architecture, 2007
2006
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006
Analysis and Performance Results of a fluid dynamics Application on MASA Stream Processor.
Proceedings of the 5th Annual IEEE/ACIS International Conference on Computer and Information Science (ICIS 2006) and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering, 2006
2005
Proceedings of the Image Analysis and Recognition, Second International Conference, 2005
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005
2004
Proceedings of the Parallel and Distributed Processing and Applications, 2004
Proceedings of the Advances in Computer Systems Architecture, 9th Asia-Pacific Conference, 2004