Mingyu Chen

Orcid: 0000-0003-4469-1037

Affiliations:
  • Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China


According to our database1, Mingyu Chen authored at least 135 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access.
ACM Trans. Archit. Code Optim., September, 2024

Suppressing the Interference Within a Datacenter: Theorems, Metric and Strategy.
IEEE Trans. Parallel Distributed Syst., May, 2024

DFabric: Scaling Out Data Parallel Applications with CXL-Ethernet Hybrid Interconnects.
CoRR, 2024

XUNI: Virtual Machine Abstraction for Self-contained and Multi-tenant Cloud FPGAs.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

Planaria: Pattern Directed Cross-page Composite Prefetcher.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

iEDA: An Open-source infrastructure of EDA.
Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024

2023
DASICS: Enhancing Memory Protection with Dynamic Compartmentalization.
CoRR, 2023

iEDA: An Open-Source Intelligent Physical Implementation Toolkit and Library.
CoRR, 2023

A Data-Driven Framework for TCP to Achieve Flexible QoS Control in Mobile Data Networks.
Proceedings of the 31st IEEE/ACM International Symposium on Quality of Service, 2023

Morpheus: An Adaptive DRAM Cache with Online Granularity Adjustment for Disaggregated Memory.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

REMU: Enabling Cost-Effective Checkpointing and Deterministic Replay in FPGA-based Emulation.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

Ah-Q: Quantifying and Handling the Interference within a Datacenter from a System Perspective.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

HoPP: Hardware-Software Co-Designed Page Prefetching for Disaggregated Memory.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

MARB: Bridge the Semantic Gap between Operating System and Application Memory Access Behavior.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
High fusion computers: The IoTs, edges, data centers, and humans-in-the-loop as a computer.
CoRR, 2022

QStack: Re-architecting User-space Network Stack to Optimize CPU Efficiency and Service Quality.
CoRR, 2022

HCMonitor: An accurate measurement system for high concurrent network services.
Concurr. Comput. Pract. Exp., 2022

GraFF: A Multi-FPGA System with Memory Semantic Fabric for Scalable Graph Processing.
Proceedings of the International Conference on Field-Programmable Technology, 2022

FPL Demo: SERVE: Agile Hardware Development Platform with Cloud IDE and Cloud FPGAs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

Increasing Flexibility of Cloud FPGA Virtualization.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

MCCBench: A C10M Benchmark Oriented to Interactive Network Services.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2022

2021
Asynchronous Memory Access Unit for General Purpose Processors.
CoRR, 2021

Teaching Computer System Courses with an Online Large-Scale Method.
Proceedings of the 2021 IEEE International Conference on Engineering, 2021

EdUCAS: An In-house CI/CD Platform with Cloud FPGAs for Agilely Conducting Computer Systems Course Projects.
Proceedings of the ITiCSE '21: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V.2, Virtual Event, Germany, June 26, 2021

LSP: Collective Cross-Page Prefetching for NVM.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

2020
Optimizing TCP Loss Recovery Performance Over Mobile Data Networks.
IEEE Trans. Mob. Comput., 2020

IMPULP: A Hardware Approach for In-Process Memory Protection via User-Level Partitioning.
J. Comput. Sci. Technol., 2020

Labeled Network Stack: A High-Concurrency and Low-Tail Latency Cloud Server Framework for Massive IoT Devices.
J. Comput. Sci. Technol., 2020

System measurement of Intel AEP Optane DIMM.
CoRR, 2020

Freeway: An Order-less User-space Framework for Non-real-time Applications.
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020

2019
RAGuard: An Efficient and User-Transparent Hardware Mechanism against ROP Attacks.
ACM Trans. Archit. Code Optim., 2019

Gene-Patterns: Should Architecture be Customized for Each Application?
CoRR, 2019

Computer Organization and Design Course with FPGA Cloud.
Proceedings of the 50th ACM Technical Symposium on Computer Science Education, 2019

HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs.
Proceedings of the 2019 IEEE International Conference on Networking, 2019

Make Page Coloring more Efficient on Slice-Based Three-Level Cache.
Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems, 2019

Characterizations and Architectural Implications of NVM's External DRAM Cache.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Engaging Heterogeneous FPGAs in the Cloud.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

ZyCube: An In-House Mini-Cluster for Agilely Developing and Conducting Computer Systems Course Projects.
Proceedings of the ACM Conference on Global Computing Education, 2019

MCC: A Predictable and Scalable Massive Client Load Generator.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2019

2018
PTAT: An Efficient and Precise Tool for Tracing and Profiling Detailed TLB Misses.
ACM Trans. Embed. Comput. Syst., 2018

PULP: Inner-process Isolation based on the Program Counter and Data Memory Address.
CoRR, 2018

ZyForce: An FPGA-based Cloud Platform for Experimental Curriculum of Computer System in University of Chinese Academy of Sciences (Abstract Only).
Proceedings of the 49th ACM Technical Symposium on Computer Science Education, 2018

Labeled Network Stack: A Co-designed Stack for Low Tail-Latency and High Concurrency in Datacenter Services.
Proceedings of the Network and Parallel Computing, 2018

Stateful Forward-Edge CFI Enforcement with Intel MPX.
Proceedings of the Advanced Computer Architecture - 12th Conference, 2018

2017
HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache.
ACM Trans. Archit. Code Optim., 2017

Joint Upload-Download TCP Acceleration over Mobile Data Networks.
Proceedings of the 14th Annual IEEE International Conference on Sensing, 2017

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Efficient Regional Congestion Awareness (ERCA) for Load Balance with Aggregated Congestion Information.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

Stem: A Table-Based Congestion Control Framework for Virtualized Data Center Networks.
Proceedings of the Network and Parallel Computing, 2017

PTAT: An efficient and precise tool for collecting detailed TLB miss traces.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Fine-Grained Data Committing for Persistent Memory.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

TDV Cache: Organizing Off-Chip DRAM Cache of NVMM from a Fusion Perspective.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

SMEFF: A scalable memory extension fabric for FPGA.
Proceedings of the International Conference on Field Programmable Technology, 2017

2016
Titian2: a scalable system-level emulator with all programmability for datacenter servers in cloud computing.
Proceedings of the 9th International Conference on Utility and Cloud Computing, 2016

Co-DIMM: Inter-Socket Data Sharing via a Common DIMM Channel.
Proceedings of the Second International Symposium on Memory Systems, 2016

Twin-Load: Bridging the Gap between Conventional Direct-Attached and Buffer-on-Board Memory Systems.
Proceedings of the Second International Symposium on Memory Systems, 2016

Adaptive rate control over mobile data networks with heuristic rate compensations.
Proceedings of the 24th IEEE/ACM International Symposium on Quality of Service, 2016

Isolating bandwidth guarantees from work conservation in the cloud.
Proceedings of the IEEE Symposium on Computers and Communication, 2016

Extending On-chip Interconnects for rack-level remote resource access.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Congestion-Aware Adaptive Routing with Quantitative Congestion Information.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Intra-host Rate Control with Centralized Approach.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

A novel approach for all-to-all routing in all-optical hypersquare torus network.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Guarantee-aware cost effective virtual machine placement algorithm for the cloud.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015
OpenFlow网络数据流路径建立开销的量化分析 (Quantitative Analysis of Flow-setup Cost in OpenFlow Network).
计算机科学, 2015

Detection of soft errors in LU decomposition with partial pivoting using algorithm-based fault tolerance.
Int. J. High Perform. Comput. Appl., 2015

Cracking Intel Sandy Bridge's Cache Hash Function.
CoRR, 2015

Twin-Load: Building a Scalable Memory System over the Non-Scalable Interface.
CoRR, 2015

Optimizing TCP loss recovery performance over mobile data networks.
Proceedings of the 12th Annual IEEE International Conference on Sensing, 2015

An Effective Correlation-Aware VM Placement Scheme for SLA Violation Reduction in Data Centers.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

AMTCP: an adaptive multi-path transmission control protocol.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

A Reliable Distributed Convolutional Neural Network for Biology Image Segmentation.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Improving Memory Access Performance of In-Memory Key-Value Store Using Data Prefetching Techniques.
Proceedings of the Advanced Parallel Processing Technologies, 2015

Exploiting Program Semantics to Place Data in Hybrid Memory.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems.
ACM Trans. Archit. Code Optim., 2014

HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap.
ACM Trans. Archit. Code Optim., 2014

MIMS: Towards a Message Interface Based Memory System.
J. Comput. Sci. Technol., 2014

A High-Performance and Cost-Efficient Interconnection Network for High-Density Servers.
J. Comput. Sci. Technol., 2014

Exploring Opportunities for Non-volatile Memories in Big Data Applications.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2014

CMD: classification-based memory deduplication through page access characteristics.
Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2014

Moby: A mobile benchmark suite for architectural simulators.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

Intelligent frame refresh for energy-aware display subsystems in mobile devices.
Proceedings of the International Symposium on Low Power Electronics and Design, 2014

Going vertical in memory management: Handling multiplicity by multi-policy.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Pipelined Compaction for the LSM-Tree.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

DWC: dynamic write consolidation for phase change memory systems.
Proceedings of the 2014 International Conference on Supercomputing, 2014

DTail: a flexible approach to DRAM refresh management.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Dandelion: A locally-high-performance and globally-high-scalability hierarchical data center network.
Proceedings of the 23rd International Conference on Computer Communication and Networks, 2014

Achieving efficient packet-based memory system by exploiting correlation of memory requests.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

A Swap-based Cache Set Index Scheme to Leverage both Superpage and Page Coloring Optimizations.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems.
Proceedings of the 17th IEEE International Conference on Computational Science and Engineering, 2014

2013
Understanding parallelism in graph traversal on multi-core clusters.
Comput. Sci. Res. Dev., 2013

MIMS: Towards a Message Interface based Memory System
CoRR, 2013

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

Scattered superpage: A case for bridging the gap between superpage and page coloring.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

ParaInsight: An Assistant for Quantitatively Analyzing Multi-granularity Parallel Region.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

DIFTSAS: A DIstributed Full Text Search and Analysis System for Big Data.
Proceedings of the 16th IEEE International Conference on Computational Science and Engineering, 2013

2012
SMAT: An Input Adaptive Sparse Matrix-Vector Multiplication Auto-Tuner
CoRR, 2012

Compression and Sieve: Reducing Communication in Parallel Breadth First Search on Distributed Memory Systems
CoRR, 2012

Trace-driven simulation of memory system scheduling in multithread application.
Proceedings of the 2012 ACM SIGPLAN workshop on Memory Systems Performance and Correctness: held in conjunction with PLDI '12, 2012

A lightweight hybrid hardware/software approach for object-relative memory profiling.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Micro-architectural characterization of desktop cloud workloads.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.
Proceedings of the International Conference on Supercomputing, 2012

Supporting User-directed Fault Tolerance over Standard MPI.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Evaluation and Optimization of Breadth-First Search on NUMA Cluster.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

A software memory partition approach for eliminating bank-level interference in multicore systems.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

HaLock: hardware-assisted lock contention detection in multithreaded applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
What Hill-Marty model learn from and break through Amdahlʼs law?
Inf. Process. Lett., 2011

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism
CoRR, 2011

HMTT: A Hybrid Hardware/Software Tracing System for Bridging Memory Trace's Semantic Gap
CoRR, 2011

On the random access performance of Cell Broadband Engine with graph analysis application
CoRR, 2011

Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Building algorithmically nonstop fault tolerant MPI programs.
Proceedings of the 18th International Conference on High Performance Computing, 2011

A fine-grained component-level power measurement method.
Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

2010
Achieving Flow-Level Controllability in Network Intrusion Detection System.
Proceedings of the 11th ACIS International Conference on Software Engineering, 2010

P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation.
Proceedings of the 24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation, 2010

Robust TCP Reassembly with a Hardware-Based Solution for Backbone Traffic.
Proceedings of the Fifth International Conference on Networking, Architecture, and Storage, 2010

QTL: An efficient scheduling policy for 10Gbps network intrusion detection system.
Proceedings of the 15th IEEE Symposium on Computers and Communications, 2010

GenerOS: An asymmetric operating system kernel for multi-core systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

DMA cache: Using on-chip storage to architecturally separate I/O data from CPU data for improving I/O performance.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

2009
Extending Amdahl's law in the multicore era.
SIGMETRICS Perform. Evaluation Rev., 2009

SimK: A Large-Scale Parallel Simulation Engine.
J. Comput. Sci. Technol., 2009

HPPNetSim: a parallel simulation of large-scale interconnection networks.
Proceedings of the 2009 Spring Simulation Multiconference, SpringSim 2009, 2009

A Scalability Analysis of the Symmetric Multiprocessing Architecture in Multi-Core System.
Proceedings of the International Conference on Networking, Architecture, and Storage, 2009

Single-particle 3d reconstruction from cryo-electron microscopy images on GPU.
Proceedings of the 23rd international conference on Supercomputing, 2009

2008
HMTT: a platform independent full-system memory trace monitoring system.
Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2008

A Network Memory Architecture Model and Performance Analysis.
Proceedings of The 2008 IEEE International Conference on Networking, 2008

Design and Evaluation of Optical Bus in High Performance Computer.
Proceedings of the 9th International Conference for Young Computer Scientists, 2008

Grid Memory Service Architecture for High Performance Computing.
Proceedings of the Seventh International Conference on Grid and Cooperative Computing, 2008

2005
A Reconfigurable Optical Interconnect System for DSAG.
Proceedings of the Sixth International Conference on Parallel and Distributed Computing, 2005

Parallel Optimization Technology for Backbone Network Intrusion Detection System.
Proceedings of the Computational Intelligence and Security, International Conference, 2005

2004
HPL Performance Prevision to Intending System Improvement.
Proceedings of the Parallel and Distributed Processing and Applications, 2004


  Loading...