Efficient Policy Gradient Reinforcement Learning in High-Noise Long Horizon Settings

Aitchison, Matthew

Efficient Policy Gradient Reinforcement Learning in High-Noise Long Horizon Settings

Date

2025

Authors

Aitchison, Matthew

Abstract

This thesis explores the enhancement of Policy Gradient (PG) methods in reinforcement learning (RL), focusing on their application in real-world scenarios. It addresses challenges in efficient evaluation, noise reduction, and long-horizon discounting in RL. Key contributions include the Atari-5 dataset, reducing evaluation time in the Arcade Learning Environment, and the Dual Network Architecture (DNA) algorithm, improving Proximal Policy Optimization's (PPO) performance in vision-based tasks. The TVL algorithm, capable of learning over long horizons without discounting, demonstrates potential in high-noise environments. This research advances the understanding and application of PG methods, highlighting their practical implications in complex decision-making and robotics.