Haipeng Luo

Orcid: 0000-0001-8056-6271

According to our database1, Haipeng Luo authored at least 117 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena.
CoRR, 2024

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms.
CoRR, 2024

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization.
CoRR, 2024

Provably Efficient Interactive-Grounded Learning with Personalized Reward.
CoRR, 2024

Optimal Multiclass U-Calibration Error and Beyond.
CoRR, 2024

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback.
CoRR, 2024

Tractable Local Equilibria in Non-Concave Games.
CoRR, 2024

Contextual Multinomial Logit Bandits with General Value Functions.
CoRR, 2024

Efficient Contextual Bandits with Uninformed Feedback Graphs.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Online Learning in Contextual Second-Price Pay-Per-Click Auctions.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games.
CoRR, 2023

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct.
CoRR, 2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games.
CoRR, 2023

Average-Constrained Policy Optimization.
CoRR, 2023

Posterior sampling-based online learning for the stochastic shortest path model.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Practical Contextual Bandits with Feedback Graphs.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Regret Matching+: (In)Stability and Fast Convergence in Games.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2nd Workshop on Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Refined Regret for Adversarial MDPs with Linear Function Approximation.
Proceedings of the International Conference on Machine Learning, 2023

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs.
Proceedings of the International Conference on Algorithmic Learning Theory, 2023

No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Clairvoyant Regret Minimization: Equivalence with Nemirovski's Conceptual Prox Method and Extension to General Convex Games.
CoRR, 2022

Near-Optimal No-Regret Learning for General Convex Games.
CoRR, 2022

Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games.
CoRR, 2022

Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints.
CoRR, 2022

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Near-Optimal No-Regret Learning Dynamics for General Convex Games.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Uncoupled Learning Dynamics with <i>O(log T)</i> Swap Regret in Multiplayer Games.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

No-Regret Learning in Time-Varying Zero-Sum Games.
Proceedings of the International Conference on Machine Learning, 2022

Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games.
Proceedings of the International Conference on Machine Learning, 2022

Learning Infinite-horizon Average-reward Markov Decision Process with Constraints.
Proceedings of the International Conference on Machine Learning, 2022

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP.
Proceedings of the International Conference on Machine Learning, 2022

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Adaptive Bandit Convex Optimization with Heterogeneous Curvature.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Policy Optimization for Stochastic Shortest Path.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

2021
Online Learning for Stochastic Shortest Path Model via Posterior Sampling.
CoRR, 2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Last-iterate Convergence in Extensive-Form Games.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously.
Proceedings of the 38th International Conference on Machine Learning, 2021

Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case.
Proceedings of the 38th International Conference on Machine Learning, 2021

Linear Last-iterate Convergence in Constrained Saddle-point Optimization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games.
Proceedings of the Conference on Learning Theory, 2021

Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach.
Proceedings of the Conference on Learning Theory, 2021

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications.
Proceedings of the Conference on Learning Theory, 2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition.
Proceedings of the Conference on Learning Theory, 2021

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds.
Proceedings of the Algorithmic Learning Theory, 2021

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Active Online Learning with Hidden Shifting Domains.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Oracle-efficient Online Learning and Auction Design.
J. ACM, 2020

Active Online Domain Adaptation.
CoRR, 2020

Linear Last-iterate Convergence for Matrix Games and Stochastic Games.
CoRR, 2020

Fair Contextual Multi-Armed Bandits: Theory and Experiments.
Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, 2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Comparator-Adaptive Convex Bandits.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes.
Proceedings of the 37th International Conference on Machine Learning, 2020

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition.
Proceedings of the 37th International Conference on Machine Learning, 2020

Taking a hint: How to leverage loss predictors in contextual bandits?
Proceedings of the Conference on Learning Theory, 2020

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback.
Proceedings of the Conference on Learning Theory, 2020

Open Problem: Model Selection for Contextual Bandits.
Proceedings of the Conference on Learning Theory, 2020

The Fair Contextual Multi-Armed Bandit.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

2019
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition.
CoRR, 2019

Equipping Experts/Bandits with Long-term Memory.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Model Selection for Contextual Bandits.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Hypothesis Set Stability and Generalization.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously.
Proceedings of the 36th International Conference on Machine Learning, 2019

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal and Parameter-free.
Proceedings of the Conference on Learning Theory, 2019

Improved Path-length Regret Bounds for Bandits.
Proceedings of the Conference on Learning Theory, 2019

Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information.
Proceedings of the Conference on Learning Theory, 2019

2018
Efficient Online Portfolio with Logarithmic Regret.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Practical Contextual Bandits with Regression Oracles.
Proceedings of the 35th International Conference on Machine Learning, 2018

More Adaptive Algorithms for Adversarial Bandits.
Proceedings of the Conference On Learning Theory, 2018

Efficient Contextual Bandits in Non-stationary Worlds.
Proceedings of the Conference On Learning Theory, 2018

Logistic Regression: The Importance of Being Improper.
Proceedings of the Conference On Learning Theory, 2018

2017
Efficient Contextual Bandits in Non-stationary Worlds.
CoRR, 2017

Corralling a Band of Bandit Algorithms.
Proceedings of the 30th Conference on Learning Theory, 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits.
Proceedings of the 30th Conference on Learning Theory, 2017

2016
Optimal and Adaptive Online Learning
PhD thesis, 2016

Three-Dimensional Surface Displacement Field Associated with the 25 April 2015 Gorkha, Nepal, Earthquake: Solution from Integrated InSAR and GPS Measurements with an Extended SISTEM Approach.
Remote. Sens., 2016

Efficient Second Order Online Learning via Sketching.
CoRR, 2016

Oracle-Efficient Learning and Auction Design.
CoRR, 2016

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Efficient Second Order Online Learning by Sketching.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Variance-Reduced and Projection-Free Stochastic Optimization.
Proceedings of the 33nd International Conference on Machine Learning, 2016

2015
Achieving All with No Parameters: Adaptive NormalHedge.
CoRR, 2015

Fast Convergence of Regularized Learning in Games.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Online Gradient Boosting.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Optimal and Adaptive Algorithms for Online Boosting.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Achieving All with No Parameters: AdaNormalHedge.
Proceedings of The 28th Conference on Learning Theory, 2015

2014
Automatic Scaling of Internet Applications for Cloud Computing Services.
IEEE Trans. Computers, 2014

Adaptive Resource Provisioning for the Cloud Using Online Bin Packing.
IEEE Trans. Computers, 2014

Accelerated Parallel Optimization Methods for Large Scale Machine Learning.
CoRR, 2014

A Drifting-Games Analysis for Online Learning and Applications to Boosting.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Towards Minimax Online Learning with Unknown Time Horizon.
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
Online Learning with Unknown Time Horizon.
CoRR, 2013

2010
Upper and lower bounds for <i>F<sub>v</sub></i>(4,4;5).
Electron. J. Comb., 2010

A Generalization of Generalized Paley Graphs and New Lower Bounds for R(3,q).
Electron. J. Comb., 2010

The existence of balanced (<i>υ</i>, {3, 6}, 1) difference families.
Sci. China Inf. Sci., 2010

2009
New lower bounds for seven classical Ramsey numbers R(3, q).
Appl. Math. Lett., 2009

2003
Edge colorings of the complete graph K149 and the lower bounds of three Ramsey numbers.
Discret. Appl. Math., 2003

2002
Lower bounds of Ramsey numbers based on cubic residues.
Discret. Math., 2002

The properties of self-complementary graphs and new lower bounds for diagonal Ramsey numbers.
Australas. J Comb., 2002

2001
New lower bounds of ten classical Ramsey numbers.
Australas. J Comb., 2001

1999
New lower bounds of fifteen classical Ramsey numbers.
Australas. J Comb., 1999


  Loading...