Advanced Reinforcement Learning Algorithms for Multi-Armed Bandit Problems
View/ Open
Date
2024-10-11Author
Robledo Relaño, Francisco
Metadata
Show full item recordAbstract
This thesis presents significant advancements in Reinforcement Learning (RL) algorithms for resource and policy management in Restless Multi-Armed Bandit (RMAB) problems. The research introduces two main approaches: for discrete and binary actions, QWI and QWINN algorithms compute Whittle indices to simplify policy determination by decoupling RMAB processes, with QWINN leveraging neural networks for Q-value computation and demonstrating better convergence rates and scalability compared to QWI; for continuous actions, the LPCA algorithm employs a Lagrangian relaxation to decouple Weakly Coupled Markov Decision Processes (MDPs), using differential evolution and greedy optimization strategies for efficient resource allocation, and showing superior performance over other RL approaches. Empirical results from simulations validate the effectiveness of these algorithms, representing a substantial contribution to resource allocation in RL and providing a foundation for future research into more generalized and scalable RL frameworks.