Shie Mannor

Orcid: 0000-0003-4439-7647

Affiliations:
  • Technion - Israel Institute of Technology, Department of Electrical Engineering, Haifa, Israel (PhD 2002)
  • Nvidia Research, Tel Aviv-Yafo, Israel


According to our database1, Shie Mannor authored at least 437 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of three.
  • Erdős number3 of two.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Dual Pricing to Prioritize Renewable Energy and Consumer Preferences in Electricity Markets.
CoRR, 2024

Efficient Fairness-Performance Pareto Front Computation.
CoRR, 2024

From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis.
CoRR, 2024

PlaMo: Plan and Move in Rich 3D Physical Environments.
CoRR, 2024

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation.
CoRR, 2024

On Bits and Bandits: Quantifying the Regret-Information Trade-off.
CoRR, 2024

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes.
CoRR, 2024

SQT - std Q-target.
CoRR, 2024

Prospective Side Information for Latent MDPs.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Value Iteration for s-rectangular Robust Markov Decision Processes.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Sobolev Space Regularised Pre Density Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Improving Token-Based World Models with Parallel Observation Prediction.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Tree Search-Based Policy Optimization under Stochastic Execution Delay.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Policy Gradient for Reinforcement Learning with General Utilities.
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Learning the Uncertainty Set in Robust Markov Decision Process.
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Towards Faster Global Convergence of Robust Policy Gradient Methods.
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Policy Gradient with Tree Search (PGTS) in Reinforcement Learning Evades Local Maxima.
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Solving Non-rectangular Reward-Robust MDPs via Frequency Regularization.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Continuous-Time Fitted Value Iteration for Robust Policies.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

Implicitly Normalized Explicitly Regularized Density Estimation.
CoRR, 2023

Robust Reinforcement Learning via Adversarial Kernel Approximation.
CoRR, 2023

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization.
CoRR, 2023

An Efficient Solution to s-Rectangular Robust Markov Decision Processes.
CoRR, 2023

Policy Gradient for s-Rectangular Robust Markov Decision Processes.
CoRR, 2023

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search.
CoRR, 2023

Towards Deployable RL - What's Broken with RL Research and a Potential Fix.
CoRR, 2023

CALM: Conditional Adversarial Latent Models for Directable Virtual Characters.
Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings, 2023

Policy Gradient for Rectangular Robust Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Optimization or Architecture: How to Hack Kalman Filtering.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Train Hard, Fight Easy: Robust Meta Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Individualized Dosing Dynamics via Neural Eigen Decomposition.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient.
Proceedings of the International Conference on Machine Learning, 2023

Learning Hidden Markov Models When the Locations of Missing Observations are Unknown.
Proceedings of the International Conference on Machine Learning, 2023

Representation-Driven Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

Reward-Mixing MDPs with Few Latent Contexts are Learnable.
Proceedings of the International Conference on Machine Learning, 2023

Learning to Initiate and Reason in Event-Driven Cascading Processes.
Proceedings of the International Conference on Machine Learning, 2023

Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning.
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

Planning and Learning with Adaptive Lookahead.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Reinforcement Learning for Datacenter Congestion Control.
SIGMETRICS Perform. Evaluation Rev., 2022

Policy Gradient for Reinforcement Learning with General Utilities.
CoRR, 2022

SoftTreeMax: Policy Gradient with Tree Search.
CoRR, 2022

Actor-Critic based Improper Reinforcement Learning.
CoRR, 2022

Efficient Policy Iteration for Robust Markov Decision Processes via Regularization.
CoRR, 2022

Whats Missing? Learning Hidden Markov Models When the Locations of Missing Observations are Unknown.
CoRR, 2022

Learning to reason about and to act on physical cascading events.
CoRR, 2022

Continuous Forecasting via Neural Eigen Decomposition of Stochastic Dynamics.
CoRR, 2022

Reinforcement Learning with a Terminator.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Tractable Optimality in Episodic Latent MABs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Finite Sample Analysis Of Dynamic Regression Parameter Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Risk-Averse Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Actor-Critic based Improper Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

The Geometry of Robust Value Functions.
Proceedings of the International Conference on Machine Learning, 2022

Analysis of Stochastic Processes through Replay Buffers.
Proceedings of the International Conference on Machine Learning, 2022

Optimizing Tensor Network Contraction Using Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms.
Proceedings of the International Conference on Machine Learning, 2022

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Reinforcement Learning for Extended Intelligence.
Proceedings of the 19th International Conference on Informatics in Control, 2022

DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles.
Proceedings of the Conference on Robot Learning, 2022

Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Online Apprenticeship Learning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Inverse reinforcement learning in contextual MDPs.
Mach. Learn., 2021

Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits.
CoRR, 2021

Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling.
CoRR, 2021

Using Kalman Filter The Right Way: Noise Estimation Is Not Optimal.
CoRR, 2021

Maximum Entropy Reinforcement Learning with Mixture Policies.
CoRR, 2021

GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning.
CoRR, 2021

Improper Learning with Gradient-based Policy Optimization.
CoRR, 2021

Dimension Free Generalization Bounds for Non Linear Metric Learning.
CoRR, 2021

Bandits with partially observable confounded data.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Action redundancy in reinforcement learning.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Known unknowns: Learning novel concepts using reasoning-by-elimination.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Robust Value Iteration for Continuous Control Tasks.
Proceedings of the Robotics: Science and Systems XVII, Virtual Event, July 12-16, 2021., 2021

Sim and Real: Better Together.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

RL for Latent MDPs: Regret Guarantees and a Lower Bound.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Reinforcement Learning in Reward-Mixing MDPs.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Twice regularized MDPs and the equivalence between robustness and regularization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Online Limited Memory Neural-Linear Bandits with Likelihood Matching.
Proceedings of the 38th International Conference on Machine Learning, 2021

Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks.
Proceedings of the 38th International Conference on Machine Learning, 2021

Value Iteration in Continuous Actions, States and Time.
Proceedings of the 38th International Conference on Machine Learning, 2021

Detecting Rewards Deterioration in Episodic Reinforcement Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Confidence-Budget Matching for Sequential Budgeted Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Acting in Delayed Environments with Non-Stationary Markov Policies.
Proceedings of the 9th International Conference on Learning Representations, 2021

Over-the-Air Adversarial Flickering Attacks Against Video Recognition Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

On the Volatility of Optimal Control Policies of a Class of Linear Quadratic Regulators.
Proceedings of the 2021 American Control Conference, 2021

Lenient Regret for Multi-Armed Bandits.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Reinforcement Learning with Trajectory Feedback.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems.
CoRR, 2020

Drift Detection in Episodic Data: Detect When Your Agent Starts Faltering.
CoRR, 2020

How to Stop Epidemics: Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks.
CoRR, 2020

The Pendulum Arrangement: Maximizing the Escape Time of Heterogeneous Random Walks.
CoRR, 2020

Bandits with Partially Observable Offline Data.
CoRR, 2020

Distributional Robustness and Regularization in Reinforcement Learning.
CoRR, 2020

Exploration-Exploitation in Constrained MDPs.
CoRR, 2020

Stealing Black-Box Functionality Using The Deep Neural Tree Architecture.
CoRR, 2020

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking.
CoRR, 2020

Price Volatility in Electricity Markets: A Stochastic Control Perspective.
CoRR, 2020

Patternless Adversarial Attacks on Video Recognition Networks.
CoRR, 2020

Maximizing the Total Reward via Reward Tweaking.
CoRR, 2020

Scalable Detection of Offensive and Non-compliant Content / Logo in Product Images.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Online Planning with Lookahead Policies.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Optimistic Policy Optimization with Bandit Feedback.
Proceedings of the 37th International Conference on Machine Learning, 2020

Topic Modeling via Full Dependence Mixtures.
Proceedings of the 37th International Conference on Machine Learning, 2020

Tight Lower Bounds for Combinatorial Multi-Armed Bandits.
Proceedings of the Conference on Learning Theory, 2020

An adaptive stochastic optimization algorithm for resource allocation.
Proceedings of the Algorithmic Learning Theory, 2020

Off-Policy Evaluation in Partially Observable Environments.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach.
IEEE/ACM Trans. Netw., 2019

Natural Language State Representation for Reinforcement Learning.
CoRR, 2019

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients.
CoRR, 2019

Multi-Step Greedy and Approximate Real Time Dynamic Programming.
CoRR, 2019

Practical Risk Measures in Reinforcement Learning.
CoRR, 2019

Variance Estimation For Online Regression via Spectrum Thresholding.
CoRR, 2019

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces.
CoRR, 2019

Image Matters: Detecting Offensive and Non-Compliant Content / Logo in Product Images.
CoRR, 2019

A Problem-Adaptive Algorithm for Resource Allocation.
CoRR, 2019

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching.
CoRR, 2019

Trust Region Value Optimization using Kalman Filtering.
CoRR, 2019

A Bayesian Approach to Robust Reinforcement Learning.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Distributional Policy Optimization: An Alternative Approach for Continuous Control.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Action Robust Reinforcement Learning and Applications in Continuous Control.
Proceedings of the 36th International Conference on Machine Learning, 2019

The Natural Language of Actions.
Proceedings of the 36th International Conference on Machine Learning, 2019

Exploration Conscious Reinforcement Learning Revisited.
Proceedings of the 36th International Conference on Machine Learning, 2019

Nonlinear Distributional Gradient Temporal-Difference Learning.
Proceedings of the 36th International Conference on Machine Learning, 2019

Reward Constrained Policy Optimization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem.
Proceedings of the Conference on Learning Theory, 2019

On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

How to Combine Tree-Search Methods in Reinforcement Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Detecting Cascades from Weak Signatures.
IEEE Trans. Netw. Sci. Eng., 2018

Source Estimation in Time Series and the Surprising Resilience of HMMs.
IEEE Trans. Inf. Theory, 2018

Multi Instance Learning For Unbalanced Data.
CoRR, 2018

Revisiting Exploration-Conscious Reinforcement Learning.
CoRR, 2018

Inspiration Learning through Preferences.
CoRR, 2018

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning.
CoRR, 2018

Interdependent Gibbs Samplers.
CoRR, 2018

Deep Learning Reconstruction of Ultra-Short Pulses.
CoRR, 2018

Train on Validation: Squeezing the Data Lemon.
CoRR, 2018

Chance-Constrained Outage Scheduling using a Machine Learning Proxy.
CoRR, 2018

Soft-Robust Actor-Critic Policy-Gradient.
Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

PAC Bandits with Risk Constraints.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2018

Beyond the One-Step Greedy Approach in Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

Ensemble Robustness and Generalization of Stochastic Deep Learning Algorithms.
Proceedings of the 6th International Conference on Learning Representations, 2018

Learning How Not to Act in Text-based Games.
Proceedings of the 6th International Conference on Learning Representations, 2018

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning.
Proceedings of the Conference On Learning Theory, 2018

A General Approach to Multi-Armed Bandits Under Risk Criteria.
Proceedings of the Conference On Learning Theory, 2018

Is a Picture Worth a Thousand Words? A Deep Multi-Modal Architecture for Product Classification in E-Commerce.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Learning Robust Options.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Finite Sample Analyses for TD(0) With Function Approximation.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
<i>k</i>-Armed Bandit.
Proceedings of the Encyclopedia of Machine Learning and Data Mining, 2017

Sequential Decision Making With Coherent Risk.
IEEE Trans. Autom. Control., 2017

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback.
SIAM J. Comput., 2017

Learn on Source, Refine on Target: A Model Transfer Learning Framework with Random Forests.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Strategic Formation of Heterogeneous Networks.
IEEE J. Sel. Areas Commun., 2017

The Stochastic Firefighter Problem.
CoRR, 2017

Situationally Aware Options.
CoRR, 2017

Deep Robust Kalman Filter.
CoRR, 2017

Outlier Robust Online Learning.
CoRR, 2017

Finite Sample Analysis for TD(0) with Linear Function Approximation.
CoRR, 2017

Concentration Bounds for Two Timescale Stochastic Approximation with Applications to Reinforcement Learning.
CoRR, 2017

Online Learning with Many Experts.
CoRR, 2017

A Nonparametric Sequential Test for Online Randomized Experiments.
Proceedings of the 26th International Conference on World Wide Web Companion, 2017

Non-parametric Online AUC Maximization.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2017

Shallow Updates for Deep Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Rotting Bandits.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Supervised learning for optimal power flow as a real-time proxy.
Proceedings of the IEEE Power & Energy Society Innovative Smart Grid Technologies Conference, 2017

Approximate Value Iteration with Temporally Extended Actions (Extended Abstract).
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Consistent On-Line Off-Policy Evaluation.
Proceedings of the 34th International Conference on Machine Learning, 2017

Multi-objective Bandits: Optimizing the Generalized Gini Index.
Proceedings of the 34th International Conference on Machine Learning, 2017

End-to-End Differentiable Adversarial Imitation Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization.
Proceedings of the 30th Conference on Learning Theory, 2017

Proxy Voting for Better Outcomes.
Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, 2017

A Deep Hierarchical Approach to Lifelong Learning in Minecraft.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Robust MDPs with <i>k</i>-Rectangular Uncertainty.
Math. Oper. Res., 2016

Reinforcement Learning in Robust Markov Decision Processes.
Math. Oper. Res., 2016

Learning the Variance of the Reward-To-Go.
J. Mach. Learn. Res., 2016

Regularized Policy Iteration with Nonparametric Function Spaces.
J. Mach. Learn. Res., 2016

Visualizing Dynamics: from t-SNE to SEMI-MDPs.
CoRR, 2016

Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce.
CoRR, 2016

How to Allocate Resources For Features Acquisition?
CoRR, 2016

Bending the Curve: Improving the ROC Curve Through Error Redistribution.
CoRR, 2016

Adaptive Lambda Least-Squares Temporal Difference Learning.
CoRR, 2016

Situational Awareness by Risk-Conscious Skills.
CoRR, 2016

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP).
CoRR, 2016

Clustering Time Series and the Surprising Robustness of HMMs.
CoRR, 2016

A Reinforcement Learning System to Encourage Physical Activity in Diabetes Patients.
CoRR, 2016

Ensemble Robustness of Deep Learning Algorithms.
CoRR, 2016

Unit Commitment using Nearest Neighbor as a Short-Term Proxy.
CoRR, 2016

Deep Reinforcement Learning Discovers Internal Models.
CoRR, 2016

Model-based Adversarial Imitation Learning.
CoRR, 2016

Distributed scenario-based optimization for asset management in a hierarchical decision making environment.
Proceedings of the Power Systems Computation Conference, 2016


Adaptive Skills Adaptive Partitions (ASAP).
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Multi-user lax communications: A multi-armed bandit approach.
Proceedings of the 35th Annual IEEE International Conference on Computer Communications, 2016

Graying the black box: Understanding DQNs.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Hierarchical Decision Making In Electricity Grid Management.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Heteroscedastic Sequences: Beyond Gaussianity.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Distinguishing Infections on Different Graph Topologies.
IEEE Trans. Inf. Theory, 2015

The Perturbed Variation.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Approximate Value Iteration with Temporally Extended Actions.
J. Artif. Intell. Res., 2015

Oracle-Based Robust Optimization via Online Learning.
Oper. Res., 2015

Bayesian Reinforcement Learning: A Survey.
Found. Trends Mach. Learn., 2015

Bootstrapping Skills.
CoRR, 2015

Actively Learning to Attract Followers on Twitter.
CoRR, 2015

Overlapping Community Detection by Online Cluster Aggregation.
CoRR, 2015

Overlapping Communities Detection via Measure Space Embedding.
CoRR, 2015

Emphatic TD Bellman Operator is a Contraction.
CoRR, 2015

Off-policy evaluation for MDPs with unknown structure.
CoRR, 2015

Contextual Markov Decision Processes.
CoRR, 2015

Reinforcement Learning for the Unit Commitment Problem.
CoRR, 2015

Learning to coordinate without communication in multi-user multi-armed bandit problems.
CoRR, 2015

Localized Epidemic Detection in Networks with Overwhelming Noise.
Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2015

Policy Gradient for Coherent Risk Measures.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Community Detection via Measure Space Embedding.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Online Learning for Adversaries with Memory: Price of Past Mistakes.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Semantic locality and context-based prefetching using reinforcement learning.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Local detection of infections in heterogeneous networks.
Proceedings of the 2015 IEEE Conference on Computer Communications, 2015

Formation games of reliable networks.
Proceedings of the 2015 IEEE Conference on Computer Communications, 2015

Dynamic Sensing: Better Classification under Acquisition Constraints.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Off-policy Model-based Learning under Unknown Factored Dynamics.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Thompson Sampling for Learning Parameterized Markov Decision Processes.
Proceedings of The 28th Conference on Learning Theory, 2015

Sensor Selection for Crowdsensing Dynamical Systems.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Optimizing the CVaR via Sampling.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

Learning When to Switch between Skills in a High Dimensional Domain.
Proceedings of the Learning for General Competency in Video Games, 2015

2014
High-Throughput Energy-Efficient LDPC Decoders Using Differential Binary Message Passing.
IEEE Trans. Signal Process., 2014

Opportunistic Approachability and Generalized No-Regret Problems.
Math. Oper. Res., 2014

Set-valued approachability and online learning with partial monitoring.
J. Mach. Learn. Res., 2014

Implicit Temporal Differences.
CoRR, 2014

Policy Gradients Beyond Expectations: Conditional Value-at-Risk.
CoRR, 2014

Thompson Sampling for Learning Parameterized MDPs.
CoRR, 2014

Distributed Robust Learning.
CoRR, 2014

Network formation games with heterogeneous players and the internet structure.
Proceedings of the ACM Conference on Economics and Computation, 2014

Heterogeneous Stream Processing and Crowdsourcing for Traffic Monitoring: Highlights.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

Sub-sampling for Multi-armed Bandits.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

Concurrent Bandits and Cognitive Radio Networks.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

How hard is my MDP?" The distribution-norm to the rescue".
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Robust Logistic Regression and Classification.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Loop-Aware Memory Prefetching Using Code Block Working Sets.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Scaling Up Robust MDPs using Function Approximation.
Proceedings of the 31th International Conference on Machine Learning, 2014

Time-Regularized Interrupting Options (TRIO).
Proceedings of the 31th International Conference on Machine Learning, 2014

Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations.
Proceedings of the 31th International Conference on Machine Learning, 2014

Latent Bandits.
Proceedings of the 31th International Conference on Machine Learning, 2014

Concept Drift Detection Through Resampling.
Proceedings of the 31th International Conference on Machine Learning, 2014

Thompson Sampling for Complex Online Problems.
Proceedings of the 31th International Conference on Machine Learning, 2014

Combining a Gauss-Markov model and Gaussian process for traffic prediction in Dublin city center.
Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT 2014), 2014

Heterogeneous Stream Processing and Crowdsourcing for Urban Traffic Management.
Proceedings of the 17th International Conference on Extending Database Technology, 2014

Approachability in unknown games: Online learning meets multi-objective optimization.
Proceedings of The 27th Conference on Learning Theory, 2014

Energy-efficient gear-shift LDPC decoders.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
Outlier-Robust PCA: The High-Dimensional Case.
IEEE Trans. Inf. Theory, 2013

Stochastic Decoding of LDPC Codes over GF(q).
IEEE Trans. Commun., 2013

Relaxed Half-Stochastic Belief Propagation.
IEEE Trans. Commun., 2013

On information propagation in mobile call networks.
Soc. Netw. Anal. Min., 2013

Generating storylines from sensor data.
Pervasive Mob. Comput., 2013

Time Series Analysis Using Geometric Template Matching.
IEEE Trans. Pattern Anal. Mach. Intell., 2013

A State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels.
Internet Math., 2013

Dynamics in tree formation games.
Games Econ. Behav., 2013

Algorithmic aspects of mean-variance optimization in Markov decision processes.
Eur. J. Oper. Res., 2013

A Primal Condition for Approachability with Partial Monitoring
CoRR, 2013

Online Learning for Loss Functions with Memory and Applications to Statistical Arbitrage
CoRR, 2013

Robust High Dimensional Sparse Regression and Matching Pursuit
CoRR, 2013

Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes
CoRR, 2013

Scaling Up Robust MDPs by Reinforcement Learning.
CoRR, 2013

Variance Adjusted Actor Critic Algorithms.
CoRR, 2013

Formation Games and the Internet Structure.
CoRR, 2013

Thompson Sampling for Complex Bandit Problems.
CoRR, 2013

Learning Multiple Models via Regularized Weighting.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Online PCA for Contaminated Data.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Detecting epidemics using highly noisy data.
Proceedings of the Fourteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing, 2013

Model selection in markovian processes.
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013

Temporal Difference Methods for the Variance of the Reward To Go.
Proceedings of the 30th International Conference on Machine Learning, 2013

Robust Sparse Regression under Adversarial Corruption.
Proceedings of the 30th International Conference on Machine Learning, 2013

Approachability, fast and slow.
Proceedings of the COLT 2013, 2013

Opportunistic Strategies for Generalized No-Regret Problems.
Proceedings of the COLT 2013, 2013

Online Learning for Time Series Prediction.
Proceedings of the COLT 2013, 2013

2012
Dithered Belief Propagation Decoding.
IEEE Trans. Commun., 2012

Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem.
IEEE Trans. Pattern Anal. Mach. Intell., 2012

Distributionally Robust Markov Decision Processes.
Math. Oper. Res., 2012

A Distributional Interpretation of Robust Optimization.
Math. Oper. Res., 2012

Robustness and generalization.
Mach. Learn., 2012

Statistical Optimization in High Dimensions.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Preface.
Proceedings of the COLT 2012, 2012

More Is Better: Large Scale Partially-supervised Sentiment Classication.
Proceedings of the 4th Asian Conference on Machine Learning, 2012

Optimization Under Probabilistic Envelope Constraints.
Oper. Res., 2012

More Is Better: Large Scale Partially-supervised Sentiment Classification - Appendix
CoRR, 2012

How to sample if you must: on optimal functional sampling
CoRR, 2012

Clustered Bandits
CoRR, 2012

Approximately optimal bidding policies for repeated first-price auctions.
Ann. Oper. Res., 2012

Joint Stochastic Decoding of LDPC Codes and Partial-Response Channels.
Proceedings of the 2012 IEEE Workshop on Signal Processing Systems, 2012

Network forensics: random infection vs spreading epidemic.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty.
Proceedings of the 29th International Conference on Machine Learning, 2012

Policy Gradients with Variance Related Risk Criteria.
Proceedings of the 29th International Conference on Machine Learning, 2012

Decoupling Exploration and Exploitation in Multi-Armed Bandits.
Proceedings of the 29th International Conference on Machine Learning, 2012

Large scale real-time bidding in the smart grid: A mean field framework.
Proceedings of the 51th IEEE Conference on Decision and Control, 2012

Duality of ancillary services and intermittent suppliers.
Proceedings of the 51th IEEE Conference on Decision and Control, 2012

On identifying the causative network of an epidemic.
Proceedings of the 50th Annual Allerton Conference on Communication, 2012

Bayesian Reinforcement Learning.
Proceedings of the Reinforcement Learning, 2012

2011
Tracking Forecast Memories for Stochastic Decoding.
J. Signal Process. Syst., 2011

Delayed Stochastic Decoding of LDPC Codes.
IEEE Trans. Signal Process., 2011

Efficient Bidding in Dynamic Grid Markets.
IEEE Trans. Parallel Distributed Syst., 2011

A Robust Learning Approach to Repeated Auctions With Monitoring and Entry Fees.
IEEE Trans. Comput. Intell. AI Games, 2011

The Sample Complexity of Dictionary Learning.
Proceedings of the COLT 2011, 2011

Robust approachability and regret minimization in games with partial monitoring.
Proceedings of the COLT 2011, 2011

Does an Efficient Calibrated Forecasting Strategy Exist?
Proceedings of the COLT 2011, 2011

Regulation, Volatility and Efficiency in Continuous-Time Markets
CoRR, 2011

Bandits with an Edge
CoRR, 2011

Activity Recognition with Mobile Phones.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2011

From Bandits to Experts: On the Value of Side-Observations.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Committing Bandits.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Probabilistic Goal Markov Decision Processes.
Proceedings of the IJCAI 2011, 2011

Unimodal Bandits.
Proceedings of the 28th International Conference on Machine Learning, 2011

Bundle Selling by Online Estimation of Valuation Functions.
Proceedings of the 28th International Conference on Machine Learning, 2011

Mean-Variance Optimization in Markov Decision Processes.
Proceedings of the 28th International Conference on Machine Learning, 2011

Learning from Multiple Outlooks.
Proceedings of the 28th International Conference on Machine Learning, 2011

Regulation and double price mechanisms in markets with friction.
Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference, 2011

Stochastic bandits with pathwise constraints.
Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference, 2011

Activity Recognition with Time-Delay Emobeddings.
Proceedings of the Computational Physiology, 2011

2010
<i>k</i>-Armed Bandit.
Proceedings of the Encyclopedia of Machine Learning, 2010

Relaxation dynamics in stochastic iterative decoders.
IEEE Trans. Signal Process., 2010

Majority-based tracking forecast memories for stochastic LDPC decoding.
IEEE Trans. Signal Process., 2010

Robust regression and Lasso.
IEEE Trans. Inf. Theory, 2010

A Min-Sum Iterative Decoder Based on Pulsewidth Message Encoding.
IEEE Trans. Circuits Syst. II Express Briefs, 2010

A Geometric Proof of Calibration.
Math. Oper. Res., 2010

Percentile Optimization for Markov Decision Processes with Parameter Uncertainty.
Oper. Res., 2010

Stochastic Chase Decoding of Reed-Solomon Codes.
IEEE Commun. Lett., 2010

Adaptive Bases for Reinforcement Learning.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2010

Online Classification with Specificity Constraints.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Generative models for rapid information propagation.
Proceedings of the First Workshop on Social Media Analytics, 2010

Resource Allocation with Supply Adjustment in Distributed Computing Systems.
Proceedings of the 2010 International Conference on Distributed Computing Systems, 2010

A novel similarity measure for time series data with applications to gait and activity recognition.
Proceedings of the UbiComp 2010: Ubiquitous Computing, 12th International Conference, 2010

Lowering Error Floors Using Dithered Belief Propagation.
Proceedings of the Global Communications Conference, 2010

Principal Component Analysis with Contaminated Data: The High Dimensional Case.
Proceedings of the COLT 2010, 2010

Learning with Global Cost in Stochastic Environments.
Proceedings of the COLT 2010, 2010

Regulation and efficiency in markets with friction.
Proceedings of the 49th IEEE Conference on Decision and Control, 2010

Adaptive bases for Q-learning.
Proceedings of the 49th IEEE Conference on Decision and Control, 2010

Relaxed half-stochastic decoding of LDPC codes over GF(q).
Proceedings of the 48th Annual Allerton Conference on Communication, 2010

Volatility and efficiency in markets with friction.
Proceedings of the 48th Annual Allerton Conference on Communication, 2010

Tutor learning using linear constraints in approximate dynamic programming.
Proceedings of the 48th Annual Allerton Conference on Communication, 2010

Activity and Gait Recognition with Time-Delay Embeddings.
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010

2009
A Kalman Filter Design Based on the Performance/Robustness Tradeoff.
IEEE Trans. Autom. Control., 2009

Network Formation: Bilateral Contracting and Myopic Dynamics.
IEEE Trans. Autom. Control., 2009

Markov Decision Processes with Arbitrary Reward Processes.
Math. Oper. Res., 2009

Robustness and Regularization of Support Vector Machines.
J. Mach. Learn. Res., 2009

Online Learning with Sample Path Constraints.
J. Mach. Learn. Res., 2009

Approachability in repeated games: Computational aspects and a Stackelberg variant.
Games Econ. Behav., 2009

Bidirectional interleavers for LDPC decoders using transmission gates.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

High dimensional Principal Component Analysis with contaminated data.
Proceedings of the 2009 IEEE Information Theory Workshop, 2009

Piecewise-stationary bandit problems with side observations.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Stochastic Decoding of LDPC Codes over GF(q).
Proceedings of IEEE International Conference on Communications, 2009

Tracking Forecast Memories in stochastic decoders.
Proceedings of the IEEE International Conference on Acoustics, 2009

A Relaxed Half-Stochastic Iterative Decoder for LDPC Codes.
Proceedings of the Global Communications Conference, 2009. GLOBECOM 2009, Honolulu, Hawaii, USA, 30 November, 2009

Online learning in Markov decision processes with arbitrarily changing rewards and transitions.
Proceedings of the 1st International Conference on Game Theory for Networks, 2009

Bidding efficiently in repeated auctions with entry and observation costs.
Proceedings of the 1st International Conference on Game Theory for Networks, 2009

Online Learning for Global Cost Functions.
Proceedings of the COLT 2009, 2009

Arbitrarily modulated Markov decision processes.
Proceedings of the 48th IEEE Conference on Decision and Control, 2009

Parametric regret in uncertain Markov decision processes.
Proceedings of the 48th IEEE Conference on Decision and Control, 2009

Risk sensitive robust support vector machines.
Proceedings of the 48th IEEE Conference on Decision and Control, 2009

Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems.
Proceedings of the American Control Conference, 2009

2008
Fully Parallel Stochastic LDPC Decoders.
IEEE Trans. Signal Process., 2008

Strategies for Prediction Under Imperfect Monitoring.
Math. Oper. Res., 2008

Regret minimization in repeated matrix games with variable stage duration.
Games Econ. Behav., 2008

Robustness, Risk, and Regularization in Support Vector Machines
CoRR, 2008

Local Two-Stage Myopic Dynamics for Network Formation Games.
Proceedings of the Internet and Network Economics, 4th International Workshop, 2008

Efficient reinforcement learning in parameterized models: discrete parameters.
Proceedings of the 3rd International ICST Conference on Performance Evaluation Methodologies and Tools, 2008

Regularized Policy Iteration.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

A Lazy Approach to Online Learning with Constraints.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

Reinforcement learning in the presence of rare events.
Proceedings of the Machine Learning, 2008

Regularized Fitted Q-Iteration: Application to Planning.
Proceedings of the Recent Advances in Reinforcement Learning, 8th European Workshop, 2008

Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case.
Proceedings of the Recent Advances in Reinforcement Learning, 8th European Workshop, 2008

Learning in the Limit with Adversarial Disturbances.
Proceedings of the 21st Annual Conference on Learning Theory, 2008

Robust dimensionality reduction for high-dimension data.
Proceedings of the 46th Annual Allerton Conference on Communication, 2008

Local dynamics for network formation games.
Proceedings of the 46th Annual Allerton Conference on Communication, 2008

Online Learning with Expert Advice and Finite-Horizon Constraints.
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 2008

2007
An Inequality for Nearly Log-Concave Distributions With Applications to Learning.
IEEE Trans. Inf. Theory, 2007

Online calibrated forecasts: Memory efficiency versus universality for learning in games.
Mach. Learn., 2007

Bias and Variance Approximation in Value Function Estimates.
Manag. Sci., 2007

Efficiency of Market-Based Resource Allocation among Many Participants.
IEEE J. Sel. Areas Commun., 2007

Multi-agent learning for engineers.
Artif. Intell., 2007

An Area-Efficient FPGA-Based Architecture for Fully-Parallel Stochastic LDPC Decoding.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2007

Reinforcement Learning-Based Load Shared Sequential Routing.
Proceedings of the NETWORKING 2007. Ad Hoc and Sensor Networks, 2007

Survey of Stochastic Computation on Factor Graphs.
Proceedings of the 37th International Symposium on Multiple-Valued Logic, 2007

Percentile optimization in uncertain Markov decision processes with application to efficient exploration.
Proceedings of the Machine Learning, 2007

Non-Cooperative Design of Translucent Networks.
Proceedings of the Global Communications Conference, 2007

Dynamics and stability in network formation games with bilateral contracts.
Proceedings of the 46th IEEE Conference on Decision and Control, 2007

User Model and Utility Based Power Management.
Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007

Adaptive Timeout Policies for Fast Fine-Grained Power Management.
Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007

2006
Design of ℓ<sub>1</sub>-optimal controllers with flexible disturbance rejection level.
IEEE Trans. Autom. Control., 2006

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems.
J. Mach. Learn. Res., 2006

Stochastic decoding of LDPC codes.
IEEE Commun. Lett., 2006

A contract-based model for directed network formation.
Games Econ. Behav., 2006

The Robustness-Performance Tradeoff in Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Asymptotics of Efficiency Loss in Competitive Market Mechanisms.
Proceedings of the INFOCOM 2006. 25th IEEE International Conference on Computer Communications, 2006

Automatic basis function construction for approximate dynamic programming and reinforcement learning.
Proceedings of the Machine Learning, 2006

Online Learning with Constraints.
Proceedings of the Learning Theory, 19th Annual Conference on Learning Theory, 2006

Online Learning with Variable Stage Duration.
Proceedings of the Learning Theory, 19th Annual Conference on Learning Theory, 2006

Design of l1-Optimal Controllers with Flexible Disturbance Rejection Level.
Proceedings of the American Control Conference, 2006

2005
Efficiency loss in a network resource allocation game: the case of elastic supply.
IEEE Trans. Autom. Control., 2005

On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies.
Math. Oper. Res., 2005

Basis Function Adaptation in Temporal Difference Reinforcement Learning.
Ann. Oper. Res., 2005

A Tutorial on the Cross-Entropy Method.
Ann. Oper. Res., 2005

The Workshop Program at the Nineteenth National Conference on Artificial Intelligence.
AI Mag., 2005

The cross entropy method for classification.
Proceedings of the Machine Learning, 2005

Reinforcement learning with Gaussian processes.
Proceedings of the Machine Learning, 2005

2004
The kernel recursive least-squares algorithm.
IEEE Trans. Signal Process., 2004

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem.
J. Mach. Learn. Res., 2004

A Geometric Approach to Multi-Criterion Reinforcement Learning.
J. Mach. Learn. Res., 2004

Bias and variance in value function estimation.
Proceedings of the Machine Learning, 2004

Dynamic abstraction in reinforcement learning via clustering.
Proceedings of the Machine Learning, 2004

Reinforcement Learning for Average Reward Zero-Sum Games.
Proceedings of the Learning Theory, 17th Annual Conference on Learning Theory, 2004

Efficiency loss in a resource allocation game: A single link in elastic supply.
Proceedings of the 43rd IEEE Conference on Decision and Control, 2004

2003
The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes.
Math. Oper. Res., 2003

Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity.
J. Mach. Learn. Res., 2003

The Cross Entropy Method for Fast Policy Search.
Proceedings of the Machine Learning, 2003

Action Elimination and Stopping Conditions for Reinforcement Learning.
Proceedings of the Machine Learning, 2003

Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning.
Proceedings of the Machine Learning, 2003

Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem.
Proceedings of the Computational Learning Theory and Kernel Machines, 2003

On-Line Learning with Imperfect Monitoring.
Proceedings of the Computational Learning Theory and Kernel Machines, 2003

2002
On the Existence of Linear Weak Learners and Applications to Boosting.
Mach. Learn., 2002

Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning.
Proceedings of the Machine Learning: ECML 2002, 2002

Sparse Online Greedy Support Vector Regression.
Proceedings of the Machine Learning: ECML 2002, 2002

The Consistency of Greedy Algorithms for Classification.
Proceedings of the Computational Learning Theory, 2002

PAC Bounds for Multi-armed Bandit and Markov Decision Processes.
Proceedings of the Computational Learning Theory, 2002

2001
The Steering Approach for Multi-Criteria Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

Learning Embedded Maps of Markov Processes.
Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28, 2001

Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments.
Proceedings of the Computational Learning Theory, 2001

Geometric Bounds for Generalization in Boosting.
Proceedings of the Computational Learning Theory, 2001

2000
Weak Learners and Improved Rates of Convergence in Boosting.
Proceedings of the Advances in Neural Information Processing Systems 13, 2000


  Loading...