Fan Yang

Orcid: 0000-0002-0378-060X

Affiliations:
  • Microsoft Research Asia, Beijing, China
  • Nanjing Universiiy, Department of Computer Science, State Key Lab for Novel Software Technology, China (former)


According to our database1, Fan Yang authored at least 81 papers between 2003 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Efficient Schedule Construction for Distributed Execution of Large DNN Models.
IEEE Trans. Parallel Distributed Syst., December, 2024

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval.
CoRR, 2024

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers.
CoRR, 2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.
CoRR, 2024

CFBench: A Comprehensive Constraints-Following Benchmark for LLMs.
CoRR, 2024

PAS: Data-Efficient Plug-and-Play Prompt Augmentation System.
CoRR, 2024

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge.
CoRR, 2024

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.
CoRR, 2024

OneSparse: A Unified System for Multi-index Vector Search.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024


Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Understanding the Weakness of Large Language Model Agents within a Complex Android Environment.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Fewer is More: Boosting Math Reasoning with Reinforced Context Pruning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
BitNet: Scaling 1-bit Transformers for Large Language Models.
CoRR, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.
CoRR, 2023

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation.
CoRR, 2023

IRGen: Generative Modeling for Image Retrieval.
CoRR, 2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation.
CoRR, 2023

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction.
CoRR, 2023

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

SPFresh: Incremental In-Place Update for Billion-Scale Vector Search.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Optimizing Dynamic Neural Networks with Brainstorm.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Welder: Scheduling Deep Learning Memory Access via Tile-graph.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

On Modular Learning of Distributed Systems for Predicting End-to-End Latency.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

Model-enhanced Vector Index.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Tutel: Adaptive Mixture-of-Experts at Scale.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

SiloD: A Co-design of Caching and Scheduling for Deep Learning Clusters.
Proceedings of the Eighteenth European Conference on Computer Systems, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Tutel: Adaptive Mixture-of-Experts at Scale.
CoRR, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.
CoRR, 2022

PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

ROLLER: Fast and Efficient Tensor Compilation for Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions.
CoRR, 2021

2020
XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation.
CoRR, 2020

HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Retiarii: A Deep Learning Exploratory-Training Framework.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

InvisibleFL: Federated Learning over Non-Informative Intermediate Updates against Multimedia Privacy Leakages.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Capuchin: Tensor-based GPU Memory Management for Deep Learning.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

2018
Gandiva: Introspective Cluster Scheduling for Deep Learning.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Scheduling CPU for GPU-based Deep Learning Jobs.
Proceedings of the ACM Symposium on Cloud Computing, 2018

2015
ImmortalGraph: A System for Storage and Analysis of Temporal Graphs.
ACM Trans. Storage, 2015

GraM: scaling graph computation to the trillions.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

2014
Chronos: a graph engine for temporal graph analysis.
Proceedings of the Ninth Eurosys Conference 2014, 2014

2012
Kineograph: taking the pulse of a fast-changing and connected world.
Proceedings of the European Conference on Computer Systems, 2012

2007
Modeling path capacity in multi-hop IEEE 802.11 networks for QoS services.
IEEE Trans. Wirel. Commun., 2007

Distributed Cooperative Rate Adaptation for Energy Efficiency in IEEE 802.11-Based Multihop Networks.
IEEE Trans. Veh. Technol., 2007

Cooperative and opportunistic transmission for wireless ad hoc networks.
IEEE Netw., 2007

2006
LION: Layered Overlay Multicast With Network Coding.
IEEE Trans. Multim., 2006

Distributed Channel Assignment and Routing in Multiradio Multichannel Multihop Wireless Networks.
IEEE J. Sel. Areas Commun., 2006

Distributed cooperative rate adaptation for energy efficiency in IEEE 802.11-based multi-hop networks.
Proceedings of the 3rd International ICST Conference on Quality of Service in Heterogeneous Wired/Wireless Networks, 2006

Modeling Path Capacity in Multi-hop IEEE 802.11 Networks for QoS Services.
Proceedings of the IEEE 3rd International Conference on Mobile Adhoc and Sensor Systems, 2006

Impact of Power and Rate Selection on the Throughput of Ad Hoc Networks.
Proceedings of IEEE International Conference on Communications, 2006

On Improving the Throughput of Media Delivery Applications in Heterogeneous Overlay Network.
Proceedings of the Global Telecommunications Conference, 2006. GLOBECOM '06, San Francisco, CA, USA, 27 November, 2006

2005
Cross-layer QoS Support for Multimedia Delivery over Wireless Internet.
EURASIP J. Adv. Signal Process., 2005

AMTP: a multipath multimedia streaming protocol for mobile ad hoc networks.
Proceedings of IEEE International Conference on Communications, 2005

2004
End-to-end TCP-friendly streaming protocol and bit allocation for scalable video over wireless Internet.
IEEE J. Sel. Areas Commun., 2004

Streaming and Bit Allocation for Scalable Video over Mobile Wireless Internet.
Proceedings of the Proceedings IEEE INFOCOM 2004, 2004

2003
An end-to-end TCP-friendly streaming protocol for multimedia over wireless Internet.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003


  Loading...