Guoping Long

According to our database1, Guoping Long authored at least 43 papers between 2004 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2022
Efficient Pipeline Planning for Expedited Distributed DNN Training.
Proceedings of the IEEE INFOCOM 2022, 2022

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
DAPPLE: a pipelined data parallel approach for training large models.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

2020
Online Bayesian max-margin subspace learning for multi-view classification and regression.
Mach. Learn., 2020

FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads.
CoRR, 2020

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads.
CoRR, 2020

Fast Training of Deep Learning Models over Multiple GPUs.
Proceedings of the Middleware '20: 21st International Middleware Conference, 2020

Optimizing distributed training deployment in heterogeneous GPU clusters.
Proceedings of the CoNEXT '20: The 16th International Conference on emerging Networking EXperiments and Technologies, 2020

2019
FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads.
CoRR, 2019

Efficient and Adaptive Kernelization for Nonlinear Max-margin Multi-view Learning.
CoRR, 2019

Characterizing Deep Learning Training Workloads on Alibaba-PAI.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

2018
FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs.
CoRR, 2018

2017
Nonlinear Maximum Margin Multi-View Learning with Adaptive Kernel.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

A Conditional Variational Framework for Dialog Generation.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs.
J. Comput. Sci. Technol., 2016

Bridging Semantic Gap Between App Names: Collective Matrix Factorization for Similar Mobile App Recommendation.
Proceedings of the Web Information Systems Engineering - WISE 2016, 2016

Online Bayesian Multiple Kernel Bipartite Ranking.
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Learning Beyond Predefined Label Space via Bayesian Nonparametric Topic Modelling.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2016

Efficient Bayesian Maximum Margin Multiple Kernel Learning.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2016

Bayesian Group Feature Selection for Support Vector Learning Machines.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2016

GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Online variational Bayesian Support Vector Regression.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Online Bayesian Max-Margin Subspace Multi-View Learning.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

2015
Listwise Approach for Rank Aggregation in Crowdsourcing.
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 2015

Detect Similar Mobile Applications with Transfer Learning.
Proceedings of the 2015 IEEE International Conference on Smart City/SocialCom/SustainCom/DataCom/SC2 2015, 2015

PE-TLD: Parallel Extended Tracking-Learning-Detection for Multi-target Tracking.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

2014
High performance two-dimensional phase unwrapping on GPUs.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

2013
MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs.
J. Comput. Sci. Technol., 2013

StreamScan: fast scan algorithms for GPUs without global barrier synchronization.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

CLSIFT: An Optimization Study of the Scale Invariance Feature Transform on GPUs.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Memristors for neural branch prediction: a case study in strict latency and write endurance challenges.
Proceedings of the Computing Frontiers Conference, 2013

2012
An Insightful Program Performance Tuning Chain for GPU Computing.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

GPURoofline: A Model for Guiding Performance Optimizations on GPUs.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
Automatic FFT Performance Tuning on OpenCL GPUs.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

CRSD: Application Specific Auto-tuning of SpMV for Diagonal Sparse Matrices.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

2009
Godson-T: An Efficient Many-Core Architecture for Parallel Program Executions.
J. Comput. Sci. Technol., 2009

Architectural support for cilk computations on many-core architectures.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core Processors.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008
Location Consistency Model Revisited: Problem, Solution and Prospects.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

A Performance Model of Dense Matrix Operations on Many-Core Architectures.
Proceedings of the Euro-Par 2008, 2008

2007
Design and Implementation of Floating Point Stack on General RISC Architecture.
Proceedings of the 15th Euromicro International Conference on Parallel, 2007

2004
A Resource Organizing Protocol for Grid Based on Bounded Two-Level Broadcasting Technique.
Proceedings of the Grid and Cooperative Computing, 2004


  Loading...