Advanced Reinforcement Learning Algorithms for Multi-Armed Bandit Problems

Robledo Relaño, Francisco

View/Open

Tesis Doctoral (4.275Mb)

Date

2024-10-11

Author

Robledo Relaño, Francisco

Metadata

Show full item record

Estadisticas en RECOLECTA
(LA Referencia)

URI

http://hdl.handle.net/10810/72334

Abstract

This thesis presents significant advancements in Reinforcement Learning (RL) algorithms for resource and policy management in Restless Multi-Armed Bandit (RMAB) problems. The research introduces two main approaches: for discrete and binary actions, QWI and QWINN algorithms compute Whittle indices to simplify policy determination by decoupling RMAB processes, with QWINN leveraging neural networks for Q-value computation and demonstrating better convergence rates and scalability compared to QWI; for continuous actions, the LPCA algorithm employs a Lagrangian relaxation to decouple Weakly Coupled Markov Decision Processes (MDPs), using differential evolution and greedy optimization strategies for efficient resource allocation, and showing superior performance over other RL approaches. Empirical results from simulations validate the effectiveness of these algorithms, representing a substantial contribution to resource allocation in RL and providing a foundation for future research into more generalized and scalable RL frameworks.

Collections

TD-Ciencias

Except where otherwise noted, this item's license is described as Atribución-NoComercial-CompartirIgual 3.0 España