Efficient Policy Gradient Reinforcement Learning in High-Noise Long Horizon Settings

Date

2025

Authors

Aitchison, Matthew

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis explores the enhancement of Policy Gradient (PG) methods in reinforcement learning (RL), focusing on their application in real-world scenarios. It addresses challenges in efficient evaluation, noise reduction, and long-horizon discounting in RL. Key contributions include the Atari-5 dataset, reducing evaluation time in the Arcade Learning Environment, and the Dual Network Architecture (DNA) algorithm, improving Proximal Policy Optimization's (PPO) performance in vision-based tasks. The TVL algorithm, capable of learning over long horizons without discounting, demonstrates potential in high-noise environments. This research advances the understanding and application of PG methods, highlighting their practical implications in complex decision-making and robotics.

Description

Keywords

Citation

Source

Type

Thesis (PhD)

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads

File
Description