This blog post delves into Reinforcement Learning (RL), showcasing its recent advancements like computers learning to play ATARI games, beating world champions at Go, and performing complex manipulation tasks. The progress in RL is attributed to factors like Compute, Data, Algorithms, and Infrastructure. Policy Gradients (PG) are explained as a popular method for RL, with emphasis on the end-to-end approach optimizing expected rewards. The training protocol for PG involves playing games like Pong, labeling decisions as good or bad based on wins or losses, and adjusting the policy network parameters accordingly. The post offers a detailed analysis of RL concepts with a focus on PG’s effectiveness and training methodologies.
http://karpathy.github.io/2016/05/31/rl/