Mar 25, 2025
[[read-time]] min read
Gemini 2.5 is a thinking model, designed to tackle increasingly complex problems. Our first 2.5 model, Gemini 2.5 Pro Experimental, leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities.

Koray Kavukcuoglu
CTO of Google DeepMind
Today we’re introducing Gemini 2.5, our most intelligent AI model. Our first 2.5 release is an experimental version of 2.5 Pro, which is state-of-the-art on a wide range of benchmarks and debuts at #1 on LMArena by a significant margin.
Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.
In the field of AI, a system’s capacity for “reasoning” refers to more than just classification and prediction. It refers to its ability to analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions.
For a long time, we’ve explored ways of making AI smarter and more capable of reasoning through techniques like reinforcement learning and chain-of-thought prompting. Building on this, we recently introduced our first thinking model, Gemini 2.0 Flash Thinking.
Now, with Gemini 2.5, we’ve achieved a new level of performance by combining a significantly enhanced base model with improved post-training. Going forward, we’re building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents.
Introducing Gemini 2.5 Pro
Gemini 2.5 Pro Experimental is our most advanced model for complex tasks. It tops the LMArena leaderboard — which measures human preferences — by a significant margin, indicating a highly capable model equipped with high-quality style. 2.5 Pro also shows strong reasoning and code capabilities, leading on common coding, math and science benchmarks.
Gemini 2.5 Pro is available now in Google AI Studio and in the Gemini app for Gemini Advanced users, and will be coming to Vertex AI soon. We’ll also introduce pricing in the coming weeks, enabling people to use 2.5 Pro with higher rate limits for scaled production use.
Enhanced reasoning
Gemini 2.5 Pro is state-of-the-art across a range of benchmarks requiring advanced reas
16 Comments
jasonpeacock
Isn't every new AI model the "most <adjective>"?
Nobody is going to say "Announcing Foobar 7.1 – not our best!"
jharohit
why not enable Canvas for this model on Gemini.google.com? Arguably the weakest link of Canvas is the terrible code that Gemini 2.0 Flash writes for Canvas to run..
cj
Slight tangent: Interesting that they use o3-mini as the comparison rather than o1.
I've been using o1 almost exclusively for the past couple months and have been impressed to the point where I don't feel the need to "upgrade" for a better model.
Are there benchmarks showing o3-mini performing better than o1?
vineyardmike
I wonder what about this one gets the +0.5 to the name. IIRC the 2.0 model isn’t particularly old yet. Is it purely marketing, does it represent new model structure, iteratively more training data over the base 2.0, new serving infrastructure, etc?
I’ve always found the use of the *.5 naming kinda silly when it became a thing. When OpenAI released 3.5, they said they already had 4 underway at the time, they were just tweaking 3 be better for ChatGPT. It felt like a scrappy startup name, and now it’s spread across the industry. Anthropic naming their models Sonnet 3, 3.5, 3.5 (new), 3.7 felt like the worst offender of this naming scheme.
I’m a much bigger fan of semver (not skipping to .5 though), date based (“Gemini Pro 2025”), or number + meaningful letter (eg 4o – “Omni”) for model names.
ekojs
> This will mark the first experimental model with higher rate limits + billing. Excited for this to land and for folks to really put the model through the paces!
From https://x.com/OfficialLoganK/status/1904583353954882046
The low rate-limit really hampered my usage of 2.0 Pro and the like. Interesting to see how this plays out.
M4v3R
The Long Context benchmark numbers seem super impressive. 91% vs 49% for GPT 4.5 at 128k context length.
falcor84
I'm most impressed by the improvement on Aider Polyglot; I wasn't expecting it to get saturated so quickly.
I'll be looking to see whether Google would be able to use this model (or an adapted version) to tackle ARC-AGI 2.
serjester
I wish they’d mention pricing – it’s hard to seriously benchmark models when you have no idea what putting it in production would actually cost.
Oras
These announcements have started to look like a template.
– Our state-of-the-art model.
– Benchmarks comparing to X,Y,Z.
– "Better" reasoning.
It might be an excellent model, but reading the exact text repeatedly is taking the excitement away.
jnd0
> with Gemini 2.5, we've achieved a new level of performance by combining a significantly enhanced base model with improved post-training. Going forward, we’re building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents.
Been playing around with it and it feels intelligent and up to date. Plus is connected to the internet. A reasoning model by default when it needs to.
I hope they enable support for the recently released canvas mode for this model soon it will be a good match.
joelthelion
Is this model going to be restricted to paying users?
hackermeows
[flagged]
vivzkestrel
hi, here is our new AI model, it performs task A x% better than our competitor 1, task B y% better than our competitor 2 seems to be the new hot AI template in town
andai
Can anyone share what they're doing with reasoning models? They seem to only make a difference with novel programming problems, like Advent of Code. So this model will help solve slightly harder advent of codes.
By extension it should also be slightly more helpful for research, R&D?
throwaway13337
Google has this habit of 'releasing' without releasing AI models. This looks to be the same?
I don't see it on the API price list:
https://ai.google.dev/gemini-api/docs/pricing
I can imagine that it's not so interesting to most of us until we can try it with cursor.
I look forward to doing so when it's out. That Aider bench mixed with a long context window that their other models are known for could be a great mix. But we'll have to wait and see.
rvz
Looks like Google DeepMind pulled a Microsoft and front-ran OpenAI.
Planned revenge from 2023.