Using reinforcement learning and $4.80 of GPU time to find the best HN post

OpenPipe, founded by Kyle, offers a service to fine-tune LLMs for high accuracy on specific tasks. They discuss the RLHF technique, which adapts reinforcement learning to LLMs. Using a reward model, trained with human feedback, they predict the upvote count of Hacker News stories. By training a model with text-only HN posts from 2016 onwards, they aim to predict the success of stories. The model successfully identifies potential front page stories, though overestimates low scores and underestimates high scores due to the unpredictability of HN front page success. In Part 2, they plan to explore using RLHF to write great HN posts.

https://openpipe.ai/blog/hacker-news-rlhf-part-1