Efficient Policy Gradient Reinforcement Learning in High-Noise Long Horizon Settings

dc.contributor.authorAitchison, Matthew
dc.date.accessioned2025-01-12T02:46:26Z
dc.date.available2025-01-12T02:46:26Z
dc.date.issued2025
dc.description.abstractThis thesis explores the enhancement of Policy Gradient (PG) methods in reinforcement learning (RL), focusing on their application in real-world scenarios. It addresses challenges in efficient evaluation, noise reduction, and long-horizon discounting in RL. Key contributions include the Atari-5 dataset, reducing evaluation time in the Arcade Learning Environment, and the Dual Network Architecture (DNA) algorithm, improving Proximal Policy Optimization's (PPO) performance in vision-based tasks. The TVL algorithm, capable of learning over long horizons without discounting, demonstrates potential in high-noise environments. This research advances the understanding and application of PG methods, highlighting their practical implications in complex decision-making and robotics.
dc.identifier.urihttps://hdl.handle.net/1885/733731566
dc.language.isoen_AU
dc.titleEfficient Policy Gradient Reinforcement Learning in High-Noise Long Horizon Settings
dc.typeThesis (PhD)
local.contributor.affiliationANU College of Engineering, Computing and Cybernetics, The Australian National University
local.contributor.supervisorKyburz, Penelope
local.identifier.doi10.25911/8T8S-TY70
local.identifier.proquestYes
local.identifier.researcherID
local.mintdoimint
local.thesisANUonly.author73287e0d-1680-4851-a32c-342a515bde96
local.thesisANUonly.key7173dbae-0992-e48b-c7e3-4f6763a7d87f
local.thesisANUonly.title000000022450_TC_1

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Aitchison_PhD_Thesis_2025.pdf
Size:
10.45 MB
Format:
Adobe Portable Document Format
Description:
Thesis Material