Siwei Wang

Orcid: 0000-0003-0764-5592

Affiliations:
  • Tsinghua University, Beijing, China


According to our database1, Siwei Wang authored at least 25 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Mechanism Design for LLM Fine-tuning with Multiple Reward Models.
CoRR, 2024

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models.
CoRR, 2024

Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Balanced and Incentivized Learning with Limited Shared Information in Multi-agent Multi-armed Bandit.
Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

2023
Taming the Exponential Action Set: Sublinear Regret and Fast Convergence to Nash Equilibrium in Online Congestion Games.
CoRR, 2023

Contextual Combinatorial Bandits with Probabilistically Triggered Arms.
CoRR, 2023

Contextual Combinatorial Bandits with Probabilistically Triggered Arms.
Proceedings of the International Conference on Machine Learning, 2023

Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
The pure exploration problem with general reward functions depending on full distributions.
Mach. Learn., 2022

Regret Analysis for Hierarchical Experts Bandit Problem.
CoRR, 2022

Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path.
CoRR, 2022

Matching in Multi-arm Bandit with Collision.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Combinatorial Bandits with Linear Constraints: Beyond Knapsacks and Fairness.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Thompson Sampling for (Combinatorial) Pure Exploration.
Proceedings of the International Conference on Machine Learning, 2022

2021
Pure Exploration Bandit Problem with General Reward Functions Depending on Full Distributions.
CoRR, 2021

Continuous Mean-Covariance Bandits.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

A One-Size-Fits-All Solution to Conservative Bandit Problems.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Dueling Bandits: From Two-dueling to Multi-dueling.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

2018
Multi-armed Bandits with Compensation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Thompson Sampling for Combinatorial Semi-Bandits.
Proceedings of the 35th International Conference on Machine Learning, 2018


  Loading...