2025
An Empirical Study of Deep Reinforcement Learning in Continuing Tasks.
CoRR, January, 2025

ExpDrug: An explainable drug recommendation model based on space feature mapping.
Neurocomputing, 2025

2024
Learning to bid and rank together in recommendation systems.
Mach. Learn., May, 2024

Correction to: Learning to bid and rank together in recommendation systems.
Mach. Learn., 2024

Epinet for Content Cold Start.
CoRR, 2024

Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank.
CoRR, 2024

Uncertainty of Joint Neural Contextual Bandit.
CoRR, 2024

Neural Collapse To Multiple Centers For Imbalanced Data.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Offline Reinforcement Learning for Optimizing Production Bidding Policies.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

2023
Pearl: A Production-ready Reinforcement Learning Agent.
CoRR, 2023

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling.
CoRR, 2023

Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning.
CoRR, 2023

Optimism Based Exploration in Large-Scale Recommender Systems.
CoRR, 2023

Deep Exploration for Recommendation Systems.
Proceedings of the 17th ACM Conference on Recommender Systems, 2023

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning.
Proceedings of the 17th ACM Conference on Recommender Systems, 2023

Scalable Neural Contextual Bandit for Recommender Systems.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

2020
Multi-Agent Safe Planning with Gaussian Processes.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

2017
A Modified Levy Jump-Diffusion Model Based on Market Sentiment Memory for Online Jump Prediction.
CoRR, 2017