Shalabh Bhatnagar

Orcid: 0000-0001-7644-3914

According to our database1, Shalabh Bhatnagar authored at least 219 papers between 1995 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Variance-Reduced Deep Actor-Critic With an Optimally Subsampled Actor Recursion.
IEEE Trans. Artif. Intell., July, 2024

Energy Management in a Cooperative Energy Harvesting Wireless Sensor Network.
IEEE Commun. Lett., January, 2024

Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks.
CoRR, 2024

Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis.
CoRR, 2024

Truncated Cauchy random perturbations for smoothed functional-based stochastic optimization.
Autom., 2024

Segmentation of 3D Gaussians using Masked Gradients.
Proceedings of the SIGGRAPH Asia 2024 Posters, SA 2024, TokyoJapan, December 3-6, 2024, 2024

Dynamic Energy Management in Competing Microgrids using Reinforcement Learning.
Proceedings of the IEEE Power & Energy Society Innovative Smart Grid Technologies Conference, 2024

Learning Dynamic Representations in Large Language Models for Evolving Data Streams.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
Actor-Critic or Critic-Actor? A Tale of Two Time Scales.
IEEE Control. Syst. Lett., 2023

Approximate Linear Programming and Decentralized Policy Improvement in Cooperative Multi-agent Markov Decision Processes.
CoRR, 2023

Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms.
CoRR, 2023

The Reinforce Policy Gradient Algorithm Revisited.
CoRR, 2023

A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks.
CoRR, 2023

n-Step Temporal Difference Learning with Optimal n.
CoRR, 2023

Autonomous UAV Navigation in Complex Environments using Human Feedback.
Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication, 2023

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search.
Proceedings of the International Conference on Machine Learning, 2023

Generalized Simultaneous Perturbation Stochastic Approximation with Reduced Estimator Bias.
Proceedings of the 57th Annual Conference on Information Sciences and Systems, 2023

A Policy Gradient Approach for Finite Horizon Constrained Markov Decision Processes.
Proceedings of the 62nd IEEE Conference on Decision and Control, 2023

2022
Generalized Second-Order Value Iteration in Markov Decision Processes.
IEEE Trans. Autom. Control., 2022

A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games.
IEEE Trans. Autom. Control., 2022

Analyzing Approximate Value Iteration Algorithms.
Math. Oper. Res., 2022

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization.
CoRR, 2022

Reinforcement Learning for Task Specifications with Action-Constraints.
CoRR, 2022

Data Efficient Safe Reinforcement Learning.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2022

Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm.
Proceedings of the International Joint Conference on Neural Networks, 2022

Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot Learning.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022

Co-operative Multi-agent Twin Delayed DDPG for Robust Phase Duration Optimization of Large Road Networks.
Proceedings of the Agents and Artificial Intelligence - 14th International Conference, 2022

Robust Traffic Signal Timing Control using Multiagent Twin Delayed Deep Deterministic Policy Gradients.
Proceedings of the 14th International Conference on Agents and Artificial Intelligence, 2022

Schedule Based Temporal Difference Algorithms.
Proceedings of the 58th Annual Allerton Conference on Communication, 2022

Gradient Temporal Difference with Momentum: Stability and Convergence.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Memory-Based Deep Reinforcement Learning for Obstacle Avoidance in UAV With Limited Environment Knowledge.
IEEE Trans. Intell. Transp. Syst., 2021

Asynchronous Stochastic Approximations With Asymptotically Biased Errors and Deep Multiagent Learning.
IEEE Trans. Autom. Control., 2021

Stochastic Approximation With Iterate-Dependent Markov Noise Under Verifiable Conditions in Compact State Space With the Stability of Iterates Not Ensured.
IEEE Trans. Autom. Control., 2021

On tight bounds for function approximation error in risk-sensitive reinforcement learning.
Syst. Control. Lett., 2021

N-Timescale Stochastic Approximation: Stability and Convergence.
CoRR, 2021

Finite Horizon Q-learning: Stability, Convergence and Simulations.
CoRR, 2021

Novel First Order Bayesian Optimization with an Application to Reinforcement Learning.
Appl. Intell., 2021

Attention Actor-Critic Algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning.
Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

2020
Analysis of Stochastic Approximation Schemes With Set-Valued Maps in the Absence of a Stability Guarantee and Their Stabilization.
IEEE Trans. Autom. Control., 2020

Random Directions Stochastic Approximation With Deterministic Perturbations.
IEEE Trans. Autom. Control., 2020

Stochastic Recursive Inclusions in Two Timescales with Nonadditive Iterate-Dependent Markov Noise.
Math. Oper. Res., 2020

Successive Over-Relaxation ${Q}$ -Learning.
IEEE Control. Syst. Lett., 2020

Generalized Speedy Q-Learning.
IEEE Control. Syst. Lett., 2020

Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach.
CoRR, 2020

Hindsight Experience Replay with Kronecker Product Approximate Curvature.
CoRR, 2020

A reinforcement learning approach to hybrid control design.
CoRR, 2020

A Stochastic Game Framework for Efficient Energy Management in Microgrid Networks.
CoRR, 2020

Reinforcement learning algorithm for non-stationary environments.
Appl. Intell., 2020

Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations.
Proceedings of the 29th IEEE International Conference on Robot and Human Interactive Communication, 2020

Learning-Based Resource Allocation in Industrial IoT Systems.
Proceedings of the 31st IEEE Annual International Symposium on Personal, 2020

Stochastic Game Frameworks for Efficient Energy Management in Microgrid Networks.
Proceedings of the IEEE PES Innovative Smart Grid Technologies Europe, 2020

Deep Reinforcement Learning with Successive Over-Relaxation and its Application in Autoscaling Cloud Resources.
Proceedings of the 2020 International Joint Conference on Neural Networks, 2020

A Convergent Off-Policy Temporal Difference Algorithm.
Proceedings of the ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020, 2020

Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach.
Proceedings of the 4th Conference on Robot Learning, 2020

Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract).
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Stability of Stochastic Approximations With "Controlled Markov" Noise and Temporal Difference Learning.
IEEE Trans. Autom. Control., 2019

An Online Sample-Based Method for Mode Estimation Using ODE Analysis of Stochastic Approximation Algorithms.
IEEE Control. Syst. Lett., 2019

Gait Library Synthesis for Quadruped Robots via Augmented Random Search.
CoRR, 2019

Hierarchical Average Reward Policy Gradient Algorithms.
CoRR, 2019

Solution of Two-Player Zero-Sum Game by Successive Relaxation.
CoRR, 2019

Reinforcement Learning in Non-Stationary Environments.
CoRR, 2019

Second Order Value Iteration in Reinforcement Learning.
CoRR, 2019

Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch.
CoRR, 2019

Efficient Adaptive Resource Provisioning for Cloud Applications using Reinforcement Learning.
Proceedings of the IEEE 4th International Workshops on Foundations and Applications of Self* Systems, 2019

Trajectory based Deep Policy Search for Quadrupedal Walking.
Proceedings of the 28th IEEE International Conference on Robot and Human Interactive Communication, 2019

Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots.
Proceedings of the 28th IEEE International Conference on Robot and Human Interactive Communication, 2019

Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives.
Proceedings of the International Conference on Robotics and Automation, 2019

Predictive and Prescriptive Analytics for Performance Optimization: Framework and a Case Study on a Large-Scale Enterprise System.
Proceedings of the 18th IEEE International Conference On Machine Learning And Applications, 2019

Efficient Budget Allocation and Task Assignment in Crowdsourcing.
Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2019

An Adaptive and Incremental Approach to Quantile Estimation.
Proceedings of the 58th IEEE Conference on Decision and Control, 2019

Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

Stochastic Approximation Trackers for Model-Based Search.
Proceedings of the 57th Annual Allerton Conference on Communication, 2019

2018
Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks.
IEEE Wirel. Commun. Lett., 2018

A stochastic approximation approach to active queue management.
Telecommun. Syst., 2018

Analysis of Gradient Descent Methods With Nondiminishing Bounded Errors.
IEEE Trans. Autom. Control., 2018

A Linearly Relaxed Approximate Linear Program for Markov Decision Processes.
IEEE Trans. Autom. Control., 2018

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning.
Math. Oper. Res., 2018

An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method.
Mach. Learn., 2018

An incremental off-policy search in a model-free Markov decision process using a single sample path.
Mach. Learn., 2018

Gradient-Based Adaptive Stochastic Search for Simulation Optimization Over Continuous Space.
INFORMS J. Comput., 2018

A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees.
CoRR, 2018

A unified decision making framework for supply and demand management in microgrid networks.
Proceedings of the 2018 IEEE International Conference on Communications, 2018

Generalized Deterministic Perturbations For Stochastic Gradient Search.
Proceedings of the 57th IEEE Conference on Decision and Control, 2018

2017
Adaptive mean queue size and its rate of change: queue management with random dropping.
Telecommun. Syst., 2017

Adaptive System Optimization Using Random Directions Stochastic Approximation.
IEEE Trans. Autom. Control., 2017

A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions.
Math. Oper. Res., 2017

RLWS: A Reinforcement Learning based GPU Warp Scheduler.
CoRR, 2017

A unified decision making framework for supply and demand management in microgrid networks.
CoRR, 2017

Conditions for Stability and Convergence of Set-Valued Stochastic Approximations: Applications to Approximate Value and Fixed point Iterations with Noise.
CoRR, 2017

Multi-Agent Q-Learning for Minimizing Demand-Supply Power Deficit in Microgrids.
CoRR, 2017

Deterministic Perturbations For Simultaneous Perturbation Methods Using Circulant Matrices.
CoRR, 2017

Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization.
Comput. Optim. Appl., 2017

A stability criterion for two timescale stochastic approximation schemes.
Autom., 2017

An Incremental Fast Policy Search Using a Single Sample Path.
Proceedings of the Pattern Recognition and Machine Intelligence, 2017

Bounds for off-policy prediction in reinforcement learning.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

A model based search method for prediction in model-free Markov decision process.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach.
Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), 2017

2016
Actor-Critic Algorithms with Online Feature Adaptation.
ACM Trans. Model. Comput. Simul., 2016

A constrained optimization perspective on actor-critic algorithms and application to network routing.
Syst. Control. Lett., 2016

Multiscale Q-learning with linear function approximation.
Discret. Event Dyn. Syst., 2016

Stochastic Recursive Inclusions in two timescales with non-additive iterate dependent Markov noise.
CoRR, 2016

Stochastic Recursive Inclusions with Non-Additive Iterate-Dependent Markov Noise.
CoRR, 2016

Gradient-based learning algorithms with constant-error estimators: stability and convergence.
CoRR, 2016

Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach.
CoRR, 2016

On a convergent off -policy temporal difference learning algorithm in on-line learning environment.
CoRR, 2016

A note on the function approximation error bound for risk-sensitive reinforcement learning.
CoRR, 2016

A Cross Entropy based Stochastic Approximation Algorithm for Reinforcement Learning with Linear Function Approximation.
CoRR, 2016

A randomized algorithm for continuous optimization.
Proceedings of the Winter Simulation Conference, 2016

Scalable focussed entity resolution.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Shaping Proto-Value Functions Using Rewards.
Proceedings of the ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands, 2016

Revisiting the Cross Entropy Method with Applications in Stochastic Global Optimization and Reinforcement Learning.
Proceedings of the ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 August-2 September 2016, The Hague, The Netherlands, 2016

Improved Hessian estimation for adaptive random directions stochastic approximation.
Proceedings of the 55th IEEE Conference on Decision and Control, 2016

2015
Energy Sharing for Multiple Sensor Nodes With Finite Buffers.
IEEE Trans. Commun., 2015

Simultaneous perturbation methods for adaptive labor staffing in service systems.
Simul., 2015

Necessary and sufficient conditions for optimality in constrained general sum stochastic games.
Syst. Control. Lett., 2015

Simultaneous Perturbation Newton Algorithms for Simulation Optimization.
J. Optim. Theory Appl., 2015

A bi-convex optimization problem to compute Nash equilibrium in n-player games and an algorithm.
CoRR, 2015

Stochastic recursive inclusions with two timescales.
CoRR, 2015

A Study of Gradient Descent Schemes for General-Sum Stochastic Games.
CoRR, 2015

Shaping Proto-Value Functions via Rewards.
CoRR, 2015

Two Timescale Stochastic Approximation with Controlled Markov noise.
CoRR, 2015

Adaptive system optimization using (simultaneous) random directions stochastic approximation.
CoRR, 2015

A Stochastic Approximation Algorithm for Quantile Estimation.
Proceedings of the Neural Information Processing - 22nd International Conference, 2015

Decentralized learning for traffic signal control.
Proceedings of the 7th International Conference on Communication Systems and Networks, 2015

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games.
Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 2015

A Generalized Reduced Linear Program for Markov Decision Processes.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks.
Wirel. Networks, 2014

Smoothed Functional Algorithms for Stochastic Optimization Using <i>q</i>-Gaussian Distributions.
ACM Trans. Model. Comput. Simul., 2014

A simulation-based algorithm for optimal pricing policy under demand uncertainty.
Int. Trans. Oper. Res., 2014

Algorithms for Nash Equilibria in General-Sum Stochastic Games.
CoRR, 2014

Approximate Dynamic Programming based on Projection onto the (min, +) subsemimodule.
CoRR, 2014

Newton-based stochastic optimization using q-Gaussian smoothed functional algorithms.
Autom., 2014

Simulation optimization via gradient-based stochastic search.
Proceedings of the 2014 Winter Simulation Conference, 2014

Universal Option Models.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Multi-agent reinforcement learning for traffic signal control.
Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems, 2014

A Markov Decision Process Framework for Predictable Job Completion Times on Crowdsourcing Platforms.
Proceedings of the Seconf AAAI Conference on Human Computation and Crowdsourcing, 2014

Adaptive sleep-wake control using reinforcement learning in sensor networks.
Proceedings of the Sixth International Conference on Communication Systems and Networks, 2014

Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes.
Proceedings of the 53rd IEEE Conference on Decision and Control, 2014

An actor critic algorithm based on Grassmanian search.
Proceedings of the 53rd IEEE Conference on Decision and Control, 2014

2013
Q-Learning Based Energy Management Policies for a Single Sensor Node with Finite Buffer.
IEEE Wirel. Commun. Lett., 2013

Feature Search in the Grassmanian in Online Reinforcement Learning.
IEEE J. Sel. Top. Signal Process., 2013

Reinforcement Learning for Sleep-Wake Scheduling in Sensor Networks.
CoRR, 2013

Mechanisms for hostile agents with capacity constraints.
Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2013

2012
Threshold Tuning Using Stochastic Optimization for Graded Signal Control.
IEEE Trans. Veh. Technol., 2012

An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes.
J. Optim. Theory Appl., 2012

Smoothed Functional Algorithms for Stochastic Optimization using q-Gaussian Distributions.
CoRR, 2012

q-Gaussian based Smoothed Functional Algorithm for Stochastic Optimization
CoRR, 2012

Optimal multi-layered congestion based pricing schemes for enhanced QoS.
Comput. Networks, 2012

General-sum stochastic games: Verifiability conditions for Nash equilibria.
Autom., 2012

q-Gaussian based Smoothed Functional algorithms for stochastic optimization.
Proceedings of the 2012 IEEE International Symposium on Information Theory, 2012

A novel Q-learning algorithm with function approximation for constrained Markov decision processes.
Proceedings of the 50th Annual Allerton Conference on Communication, 2012

2011
Stochastic approximation algorithms for constrained optimization via simulation.
ACM Trans. Model. Comput. Simul., 2011

Reinforcement Learning With Function Approximation for Traffic Signal Control.
IEEE Trans. Intell. Transp. Syst., 2011

An Optimized SDE Model for Slotted Aloha.
IEEE Trans. Commun., 2011

Stochastic Algorithms for Discrete Parameter Simulation Optimization.
IEEE Trans Autom. Sci. Eng., 2011

The Borkar-Meyn theorem for asynchronous stochastic approximations.
Syst. Control. Lett., 2011

Reinforcement learning with average cost for adaptive control of traffic lights at intersections.
Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems, 2011

Stochastic Optimization for Adaptive Labor Staffing in Service Systems.
Proceedings of the Service-Oriented Computing - 9th International Conference, 2011

Smoothed Functional and Quasi-Newton Algorithms for Routing in Multi-stage Queueing Network with Constraints.
Proceedings of the Distributed Computing and Internet Technology, 2011

2010
An efficient algorithm for scheduling in bluetooth piconets and scatternets.
Wirel. Networks, 2010

Optimized Policies for the Retransmission Probabilities in Slotted Aloha.
Simul., 2010

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes.
Syst. Control. Lett., 2010

Toward Off-Policy Learning Control with Function Approximation.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

2009
Pattern Synthesis for Nonparametric Pattern Recognition.
Proceedings of the Encyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes), 2009

Optimal parameter trajectory estimation in parameterized SDEs: An algorithmic procedure.
ACM Trans. Model. Comput. Simul., 2009

A probabilistic constrained nonlinear optimization framework to optimize RED parameters.
Perform. Evaluation, 2009

A proof of convergence of the B-RED and P-RED algorithms for random early detection.
IEEE Commun. Lett., 2009

Natural actor-critic algorithms.
Autom., 2009

Multi-Step Dyna Planning for Policy Evaluation and Control.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Fast gradient-descent methods for temporal-difference learning with linear function approximation.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS.
Proceedings of the 48th IEEE Conference on Decision and Control, 2009

2008
Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes.
Simul., 2008

An efficient ad recommendation system for TV programs.
Multim. Syst., 2008

New algorithms of the Q-learning type.
Autom., 2008

Ant Colony Optimization Algorithms for Shortest Path Problems.
Proceedings of the Network Control and Optimization, Second Euro-NF Workshop, 2008

SPSA based feature relevance estimation for video retrieval.
Proceedings of the International Workshop on Multimedia Signal Processing, 2008

2007
Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization.
ACM Trans. Model. Comput. Simul., 2007

Gelfand-Yaglom-Perez theorem for generalized relative entropy functionals.
Inf. Sci., 2007

Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes.
Discret. Event Dyn. Syst., 2007

Incremental Natural Actor-Critic Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

An Optimal Weighted-Average Congestion Based Pricing Scheme for Enhanced QoS.
Proceedings of the Distributed Computing and Internet Technology, 2007

An Efficient and Optimized Bluetooth Scheduling Algorithm for Piconets.
Proceedings of the Distributed Computing and Internet Technology, 2007

Fuzzy Clustering Based Ad Recommendation for TV Programs.
Proceedings of the Interactive TV: a Shared Experience, 5th European Conference, 2007

Link route pricing for enhanced QoS.
Proceedings of the 46th IEEE Conference on Decision and Control, 2007

Discrete parameter simulation optimization algorithms with applications to admission control with dependent service times.
Proceedings of the 46th IEEE Conference on Decision and Control, 2007

Network flow-control using asynchronous stochastic approximation.
Proceedings of the 46th IEEE Conference on Decision and Control, 2007

Solving MDPs using Two-timescale Simulated Annealing with Multiplicative Weights.
Proceedings of the American Control Conference, 2007

Parametrized Actor-Critic Algorithms for Finite-Horizon MDPs.
Proceedings of the American Control Conference, 2007

2006
Robust optimization of Random Early Detection.
Telecommun. Syst., 2006

Partition based pattern synthesis technique with efficient algorithms for nearest neighbor classification.
Pattern Recognit. Lett., 2006

A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events.
J. Mach. Learn. Res., 2006

On Measure Theoretic definitions of Generalized Information Measures and Maximum Entropy Prescriptions
CoRR, 2006

Actor-critic algorithms for hierarchical Markov decision processes.
Autom., 2006

SPSA algorithms with measurement reuse.
Proceedings of the Winter Simulation Conference WSC 2006, 2006

A Four-Timescale Algorithm for Constrained Stochastic Optimization of RED.
Proceedings of the 45th IEEE Conference on Decision and Control, 2006

A Reinforcement Learning Based Algorithm for Finite Horizon Markov Decision Processes.
Proceedings of the 45th IEEE Conference on Decision and Control, 2006

2005
Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization.
ACM Trans. Model. Comput. Simul., 2005

Optimal Threshold Policies for Admission Control in Communication Networks via Discrete Parameter Stochastic Approximation.
Telecommun. Syst., 2005

A Discrete Parameter Stochastic Approximation Algorithm for Simulation Optimization.
Simul., 2005

Overlap pattern synthesis with an efficient nearest neighbor classifier.
Pattern Recognit., 2005

Uniqueness of Nonextensive entropy under Renyi's Recipe
CoRR, 2005

Properties of Kullback-Leibler cross-entropy minimization in nonextensive framework.
Proceedings of the 2005 IEEE International Symposium on Information Theory, 2005

Solution of Mdps Using Simulation-Based Value Iteration.
Proceedings of the Artificial Intelligence Applications and Innovations - IFIP TC12 WG12.5, 2005

Information theoretic justification of Boltzmann selection and its generalization to Tsallis case.
Proceedings of the IEEE Congress on Evolutionary Computation, 2005

2004
A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes.
IEEE Trans. Autom. Control., 2004

Fusion of multiple approximate nearest neighbor classifiers for fast and efficient classification.
Inf. Fusion, 2004

Generalized Evolutionary Algorithm based on Tsallis Statistics
CoRR, 2004

A Pattern Synthesis Technique with an Efficient Nearest Neighbor Classifier for Binary Pattern Recognition.
Proceedings of the 17th International Conference on Pattern Recognition, 2004

Cauchy annealing schedule: an annealing schedule for Boltzmann selection scheme in evolutionary algorithms.
Proceedings of the IEEE Congress on Evolutionary Computation, 2004

Hierarchical decision making in semiconductor fabs using multi-time scale Markov decision processes.
Proceedings of the 43rd IEEE Conference on Decision and Control, 2004

2003
Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences.
ACM Trans. Model. Comput. Simul., 2003

Multiscale Chaotic SPSA and Smoothed Functional Algorithms for Simulation Optimization.
Simul., 2003

Quotient evolutionary space: abstraction of evolutionary process w.r.t macroscopic properties.
Proceedings of the IEEE Congress on Evolutionary Computation, 2003

2002
A time aggregation approach to Markov decision processes.
Autom., 2002

2001
Optimal structured feedback policies for ABR flow control using two-timescale SPSA.
IEEE/ACM Trans. Netw., 2001

1995
A Convex Analytic Framework for Ergodic Control of Semi-Markov Processes.
Math. Oper. Res., 1995


  Loading...