- Код статьи
- S0044466925030074-1
- DOI
- 10.31857/S0044466925030074
- Тип публикации
- Статья
- Статус публикации
- Опубликовано
- Авторы
- Том/ Выпуск
- Том 65 / Номер выпуска 3
- Страницы
- 325-337
- Аннотация
- Журнал вычислительной математики и математической физики, OPTIMAL APPROXIMATION OF AVERAGE REWARD MARKOV DECISION PROCESSES
- Ключевые слова
- Дата публикации
- 17.09.2025
- Год выхода
- 2025
- Всего подписок
- 0
- Всего просмотров
- 24
Библиография
- 1. Gheshlaghi Azar M., Munos R., Kappen H.J. Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Machine learning, vol. 91 (2013), pp. 325–349.
- 2. Zurek M., Chen Y. Span-Based Optimal Sample Complexity for Average Reward MDPs. arXiv preprint arXiv:2311.13469.
- 3. Tuynman A., Degenne R., Kaufmann E. Finding good policies in average-reward Markov Decision Processes without prior knowledge. arXiv preprint arXiv:2405.17108.
- 4. Wang S., Blanchet J., Glynn P. Optimal sample complexity for average reward markov decision processes. arXiv preprint arXiv:2310.08833.
- 5. Bartlett P.L., Tewari A. REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. arXiv preprint arXiv:1205.2661.
- 6. Wang J., Wang M., Yang L.F. Near sample-optimal reduction-based policy learning for average reward mdp. arXiv preprint arXiv:2212.00603.
- 7. Goyal V., Grand-Clement J. A first-order approach to accelerated value iteration. Operations Research, vol. 71, no. 2 (2023), pp. 517–535.
- 8. Grand-Clement J. From convex optimization to MDPs: A review of first-order, second-order and quasi-Newton methods for MDPs. arXiv preprint arXiv:2104.10677.
- 9. Farahmand A.m., Ghavamzadeh M. PID accelerated value iteration algorithm. In International Conference on Machine Learning. PMLR, pp. 3143–3153.
- 10. Weissman T., Ordentlich E., Seroussi G., Verdu S., Weinberger M.J. Inequalities for the L1 deviation of the empirical distribution. Hewlett-Packard Labs, Tech. Rep (2003), p. 125.
- 11. Singh S.P., Yee R.C. An upper bound on the loss from approximate optimal-value functions. Machine Learning, vol. 16 (1994), pp. 227–233.
- 12. Li G., Wei Y., Chi Y., Gu Y., Chen Y. Breaking the sample size barrier in model-based reinforcement learning with a generative model. Advances in neural information processing systems, vol. 33 (2020), pp. 12 861–12 872.
- 13. Puterman M.L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- 14. Wang S., Blanchet J., Glynn P. Optimal Sample Complexity of Reinforcement Learning for Mixing Discounted Markov Decision Processes. arXiv preprint arXiv:2302.07477.
- 15. Li T., Wu F., Lan G. Stochastic first-order methods for average-reward markov decision processes. arXiv preprint arXiv:2205.05800.
- 16. Jin Y., Sidford A. Towards tight bounds on the sample complexity of average-reward MDPs. In International Conference on Machine Learning. PMLR, pp. 5055–5064.
- 17. Tiapkin D., Gasnikov A. Primal-dual stochastic mirror descent for MDPs. In International Conference on Artificial Intelligence and Statistics. PMLR, pp. 9723–9740.