Efficient Policy Gradient Reinforcement Learning in High-Noise Long Horizon Settings
Date
2025
Authors
Aitchison, Matthew
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis explores the enhancement of Policy Gradient (PG) methods in reinforcement learning (RL), focusing on their application in real-world scenarios. It addresses challenges in efficient evaluation, noise reduction, and long-horizon discounting in RL. Key contributions include the Atari-5 dataset, reducing evaluation time in the Arcade Learning Environment, and the Dual Network Architecture (DNA) algorithm, improving Proximal Policy Optimization's (PPO) performance in vision-based tasks. The TVL algorithm, capable of learning over long horizons without discounting, demonstrates potential in high-noise environments. This research advances the understanding and application of PG methods, highlighting their practical implications in complex decision-making and robotics.
Description
Keywords
Citation
Collections
Source
Type
Thesis (PhD)
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material