
Reinforcement learning (RL) environments are rapidly becoming one of the most valuable and strategically important inputs in training frontier artificial intelligence models, according to a new report by Epoch AI. Based on interviews with 18 stakeholders spanning RL environment startups, neolabs and leading AI research labs, the report highlights how these environments increasingly define what advanced AI systems can learn, execute and be evaluated on—positioning them as critical infrastructure in the AI development stack.
RL environments simulate structured, interactive settings in which AI agents learn through trial and error. While expensive and time-consuming to build, a single environment can be reused across hundreds of tasks, making the economics viable despite high upfront costs. Once created, these environments become deeply embedded in training pipelines, creating long-term value for both developers and AI labs.
The report reveals that pricing in this niche is already reaching significant scale. One RL environment founder told Epoch AI, “I’ve seen $200 to $2,000 mostly. $20k per task would be rare but possible,” with the firm adding that the $20k figure “comes up for especially complex software engineering tasks, but it’s rare.” Beyond individual tasks, commercial contracts are often far larger. Epoch AI noted that “Contract sizes are often six to seven figures per quarter,” with interviewees citing deals ranging from $300,000 to well over $1 million depending on factors such as task volume, customization and exclusivity.
A growing ecosystem of companies is emerging to meet this demand, including players such as Mercor, Surge, Handshake and Turing. These firms focus on building specialized environments that allow AI models to practice complex workflows, from software engineering and UI navigation to decision-making in dynamic systems. According to SemiAnalysis, so-called “UI gym” environments can cost around $20,000 per website, adding that “OpenAI has purchased hundreds of sites for ChatGPT Agent training and development.”
Spending appetite among major AI labs underscores the strategic importance of these tools. The Information has reported that Anthropic discussed spending more than $1 billion on RL environments and related infrastructure. An employee at an RL environment startup summarized current demand trends by saying, “RL is the main use. We have some requests for creating [environments] for benchmarking. I’d say perhaps 10–20x more the former vs the latter.”
As frontier AI systems increasingly rely on autonomous decision-making and multi-step reasoning, the report suggests RL environments will play a central role in shaping both model capabilities and competitive advantage, turning what was once a niche tooling layer into a high-stakes market at the core of AI innovation.




