This is a crosspost of a post from my blog, Metal Ivy . The original is here: Reinforcement Learning on Forecasting Will Give Us a Superhuman Forecaster . Why RL on forecasting? When DeepSeek R1 came out in January 2025, I felt that the fact that RL on LLMs simply worked was incredible, but using it on coding and math wasn’t the right path. Before RL we had pretraining, a scalable and general training methodology that worked extremely well to get the model to the human level, through learning by imitation over human data. Then RL came in and gave us a way to get even further, to the expert level and beyond, through sampling many trajectories from the LLM and using a reward function to select the best ones to reinforce. But it isn’t general anymore when only short term, self contained verifiable tasks such as coding or math make up the environment. A strongly superhuman coder might change everything - if recursive self improvement happens like the labs hope (and doesn’t kill us). But it might not change that much at all by itself, beyond giving us more of the software abundance we in many ways already have. A strongly superhuman forecaster instantly gives people and organizations the ability to make superhuman decisions through forecasting of their outcomes, and would be a massive boost to the overall competence of our civilization. You may ask why should it work, even in theory - math is deterministic and forecasting is not, so forecasting reward may give bad weight updates.…

Full article content could not be extracted automatically. Read the original below.