An experimental large language model (LLM) developed by OpenAI has reached a remarkable milestone in AI-driven mathematical reasoning by scoring at a gold medal level in the 2025 International Math Olympiad (IMO). The achievement was shared by OpenAI researcher Alexander Wei, who revealed that the model correctly solved five out of six problems from this year’s IMO under conditions identical to those faced by human participants.
“We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs,” Wei explained. The model earned 35 out of 42 points, a score that would place it in the top tier of actual competitors.
The IMO is the world’s most prestigious mathematics competition for high school students and is known for its highly challenging and creativity-driven problems. Wei highlighted how far AI has advanced, noting the progression from benchmarks like GSM8K, MATH, and AIME to this latest leap: “We’ve now progressed from GSM8K (~0.1 min for top humans) MATH benchmark (~1 min) AIME (~10 mins) IMO (~100 mins),” he said.
To validate the result, the model’s submissions were independently reviewed by three former IMO medalists, who unanimously confirmed the correctness of its solutions. “The model solved P1 through P5; it did not produce a solution for P6,” Wei noted, adding that the model’s answers were publicly shared and reflect a “distinct style,” given its experimental nature.
“By going beyond the reinforcement learning paradigm of clear-cut, verifiable rewards we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians,” he added.
However, this cutting-edge model won’t be available to the public anytime soon. Wei clarified, “We don’t plan to release a model with IMO gold level of capability for many months.” OpenAI CEO Sam Altman echoed this sentiment, stating, “We are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques.”
Reflecting on the journey, Wei shared that in 2021 he had predicted only modest AI progress in math by 2025. “Instead, we have IMO gold.”