
Gemini 2.5 Flash by meetpateltech
Director of Product Management
Gemini
Today we are rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI. Building upon the popular foundation of 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, while still prioritizing speed and cost. Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off. The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost, and latency. Even with thinking off, developers can maintain the fast speeds of 2.0 Flash, and improve performance.
Our Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding. Instead of immediately generating an output, the model can perform a “thinking” process to better understand the prompt, break down complex tasks, and plan a response. On complex tasks that require multiple steps of reasoning (like solving math problems or analyzing research questions), the thinking process allows the model to arrive at more accurate and comprehensive answers. In fact, Gemini 2.5 Flash performs strongly on Hard Prompts in LMArena, second only to 2.5 Pro.
2.5 Flash has comparable metrics to other leading models for a fraction of the cost and size.
Our most cost-efficient thinking model
2.5 Flash continues to lead as the model with the best price-to-performance ratio.
Gemini 2.5 Flash adds another model to Google’s pareto frontier of cost to quality.*
Fine-grained controls to manage thinking
We know that different use cases have different tradeoffs in quality, cost, and latency. To give developers flexibility, we’ve enabled setting a thinking budget that offers fine-grained control over the maximum number of tokens a model can generate while thinking. A higher budget allows the model to reason further to improve quality
30 Comments
xnx
50% price increase from Gemini 2.0 Flash. That sounds like a lot, but Flash is still so cheap when compared to other models of this (or lesser) quality. https://developers.googleblog.com/en/start-building-with-gem…
byefruit
It's interesting that there's a price nearly 6x price difference between reasoning and no reasoning.
This implies it's not a hybrid model that can just skip reasoning steps if requested.
Anyone know what else they might be doing?
Reasoning means contexts will be longer (for thinking tokens) and there's an increase in cost to inference with a longer context but it's not going to be 6x.
Or is it just market pricing?
punkpeye
This is cool, but rate limits on all of these preview models are PITA
arnaudsm
Gemini flash models have the least hype, but in my experience in production have the best bang for the buck and multimodal tooling.
Google is silently winning the AI race.
transformi
Bad day is going on google.
First the decleration of illegal monopoly..
and now… Google’s latest innovation: programmable overthinking.
With Gemini 2.5 Flash, you too can now set a thinking_budget—because nothing says "state-of-the-art AI" like manually capping how long it’s allowed to reason. Truly the dream: debugging a production outage at 2am wondering if your LLM didn’t answer correctly because you cheaped out on tokens. lol.
“Turn thinking off for better performance.” That’s not a model config, that’s a metaphor for Google’s entire AI strategy lately.
At this point, Gemini isn’t an AI product—it’s a latency-cost-quality compromise simulator with a text interface. Meanwhile, OpenAI and Anthropic are out here just… cooking the benchmarks
hmaxwell
I did some testing this morning:
Prompt: "can you find any mistakes on my codebase? I put one in there on purpose" + 70,000 tokens of codebase where in one line I have an include for a non-existent file.
Claude 3.7 Sonnet (Try 1 – new chat) Claimed it Found a race condition in the api of the admin interface that would be triggered if two admins were to change the room order at the same time. Claude suggested I group all sql queries in a single transaction. I looked at the code and found that it already used a transaction for all queries. I said: the order_update api is already done with a transaction. Claude replied: "You're absolutely right, and I apologize for my mistake. I was incorrect to claim there was a race condition issue. The transaction ensures atomicity and consistency of the updates, and the SQL queries are properly structured for their intended purpose."
Claude 3.7 Sonnet (Try 2 – new chat) Found a comment in the admin ui javascript code that suggested a potential alternative to event handler cleanup that was not implemented because I decided to go with a cleaner route. Then asked "Is this the issue you intentionally included, or would you like me to look for other potential problems?" I said: "The comment merely suggests an alternative, right?" claude said: "Yes, you're absolutely right. The comment is merely suggesting an alternative approach that isn't being used in the code, rather than indicating a mistake. So there's no actual bug or mistake in this part of the code – just documentation of different possible approaches. I apologize for misinterpreting this as an issue!"
Claude 3.7 Sonnet (Try 3 – new chat) When processing items out of the database to generate QR codes in the admin interface, Claude says that my code both attempts to generate QR codes with undefined data AS WELL AS saying that my error handling skips undefined data. Claude contradicts itself within 2 sentences. When asking about clarification Claude replies: Looking at the code more carefully, I see that the code actually has proper error handling. I incorrectly stated that it "still attempts to call generateQRCode()" in the first part of my analysis, which was wrong. The code properly handles the case when there's no data-room attribute.
Gemnini Advanced 2.5 Pro (Try 1 – new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.
Gemnini Advanced 2.5 Pro (Try 2 – new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.
Gemnini Advanced 2.5 Pro (Try 3 – new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.
o4-mini-high and o4-mini and o3 and 4.5 and 4o – "The message you submitted was too long, please reload the conversation and submit something shorter."
Workaccount2
OpenAI might win the college students but it looks like Google will lock in enterprise.
statements
Interesting to note that this might be the only model with knowledge cut off as recent as 2025 January
ein0p
Absolutely decimated on metrics by o4-mini, straight out of the gate, and not even that much cheaper on output tokens (o4-mini's thinking can't be turned off IIRC).
xbmcuser
For a non programmer like me google is becoming shockingly good. It is giving working code the first time. I was playing around with it asked it to write code to scrape some data of a website to analyse. I was expecting it to write something that would scrape the data and later I would upload the data to it to analyse. But it actually wrote code that scraped and analysed the data. It was basic categorizing and counting of the data but I was not expecting it to do that.
__alexs
Does billing for the API actually work properly yet?
alecco
Gemini models are very good but in my experience they tend to overdo the problems. When I give it things for context and something to rework, Gemini often reworks the problem.
For software it is barely useful because you want small commits for specific fixes not a whole refactor/rewrite. I tried many prompts but it's hard. Even when I give it function signatures of the APIs the code I want to fix uses, Gemini rewrites the API functions.
If anybody knows a prompt hack to avoid this, I'm all ears. Meanwhile I'm staying with Claude Pro.
ks2048
If this announcement is targeting people not up-to-date on the models available, I think they should say what "flash" means. Is there a "Gemini (non-flash)"?
I see the 4 Google model names in the chart here. Are these 4 the main "families" of models to choose from?
– Gemini-Pro-Preview
– Gemini-Flash-Preview
– Gemini-Flash
– Gemini-Flash-Lite
AStonesThrow
I've been leveraging the services of 3 LLMs, mainly: Meta, Gemini, and Copilot.
It depends on what I'm asking. If I'm looking for answers in the realm of history or culture, religion, or I want something creative such as a cute limerick, or a song or dramatic script, I'll ask Copilot. Currently, Copilot has two modes: "Quick Answer"; or "Think Deeply", if you want to wait about 30 seconds for a good answer.
If I want info on a product, a business, an industry or a field of employment, or on education, technology, etc., I'll inquire of Gemini.
Both Copilot and Gemini have interactive voice conversation modes. Thankfully, they will also write a transcript of what we said. They also eagerly attempt to engage the user with further questions and followups, with open questions such as "so what's on your mind tonight?"
And if I want to know about pop stars, film actors, the social world or something related to tourism or recreation in general, I can ask Meta's AI through [Facebook] Messenger.
One thing I found to be extremely helpful and accurate was Gemini's tax advice. I mean, it was way better than human beings at the entry/poverty level. Commercial tax advisors, even when I'd paid for the Premium Deluxe Tax Software from the Biggest Name, they just went to Google stuff for me. I mean, they didn't even seem to know where stuff was on irs.gov. When I asked for a virtual or phone appointment, they were no-shows, with a litany of excuses. I visited 3 offices in person; the first two were closed, and the third one basically served Navajos living off the reservation.
So when I asked Gemini about tax information — simple stuff like the terminology, definitions, categories of income, and things like that — Gemini was perfectly capable of giving lucid answers. And citing its sources, so I could immediately go find the IRS.GOV publication and read it "from the horse's mouth".
Oftentimes I'll ask an LLM just to jog my memory or inform me of what specific terminology I should use. Like "Hey Gemini, what's the PDU for Ethernet called?" and when Gemini says it's a "frame" then I have that search term I can plug into Wikipedia for further research. Or, for an introduction or overview to topics I'm unfamiliar with.
LLMs are an important evolutionary step in the general-purpose "search engine" industry. One problem was, you see, that it was dangerous, annoying, or risky to go Googling around and click on all those tempting sites. Google knew this: the dot-com sites and all the SEO sites that surfaced to the top were traps, they were bait, they were sometimes legitimate scams. So the LLM providers are showing us that we can stay safe in a sandbox, without clicking external links, without coughing up information about our interests and setting cookies and revealing our IPv6 addresses: we can safely ask a local LLM, or an LLM in a trusted service provider, about whatever piques our fancy. And I am glad for this. I saw y'all complaining about how every search engine was worthless, and the Internet was clogged with blogspam, and there was no real information anymore. Well, perhaps LLMs, for now, are a safe space, a sandbox to play in, where I don't need to worry about drive-by-zero-click malware, or being inundated with Joomla ads, or popups. For now.
cynicalpeace
1. The main transformative aspect of LLMs has been in writing code.
2. LLMs have had less transformative aspects in 2025 than we anticipated back in late 2022.
3. LLMs are unlikely to be very transformative to society, even as their intelligence increases, because intelligence is a minor changemaker in society. Bigger changemakers are motivation, courage, desire, taste, power, sex and hunger.
4. LLMs are unlikely to develop these more important traits because they are trained on text, not evolved in a rigamarole of ecological challenges.
charcircuit
500 RPD for the free tier is good enough for my coding needs. Nice.
AbuAssar
I noticed that OpenAI don't compare their models to third party models in their announcement posts, unlike google, meta and the others.
mmaunder
More great innovation from Google. OpenAI have two major problems.
The first is Google's vertically integrated chip pipeline and deep supply chain and operational knowledge when it comes to creating AI chips and putting them into production. They have a massive cost advantage at every step. This translates into more free services, cheaper paid services, more capabilities due to more affordable compute, and far more growth.
Second problem is data starvation and the unfair advantage that social media has when it comes to a source of continually refreshed knowledge. Now that the foundational model providers have churned through the common crawl and are competing to consume things like video and whatever is left, new data is becoming increasingly valuable as a differentiator, and more importantly, as a provider of sustained value for years to come.
SamA has signaled both of these problems when he made noises about building a fab a while back and is more recently making noises about launching a social media platform off OpenAI. The smart money among his investors know these issues to be fundamental in deciding if OAI will succeed or not, and are asking the hard questions.
If the only answer for both is "we'll build it from scratch", OpenAI is in very big trouble. And it seems that that is the best answer that SamA can come up with. I continue to believe that OpenAI will be the Netscape of the AI revolution.
The win is Google's for the taking, if they can get out of their own way.
mark_l_watson
Nice! Low price, even with reasoning enabled. I have been working on a short new book titled “Practical AI with Google: A Solo Knowledge Worker's Guide to Gemini, AI Studio, and LLM APIs” but with all of Google’s recent announcements it might not be a short book.
serjester
Just ran it on one of our internal PDF (3 pages, medium difficulty) to json benchmarks:
gemini-flash-2.0:
60 ish% accuracy
6,250 pages per dollar
gemini-2.5-flash-preview (no thinking):
80 ish% accuracy
1,700 pages per dollar
gemini-2.5-flash-preview (with thinking):
80 ish% accuracy (not sure what's going on here)
350 pages per dollar
gemini-flash-2.5:
90 ish% accuracy
150 pages per dollar
I do wish they separated the thinking variant from the regular one – it's incredibly confusing when a model parameter dramatically impacts pricing.
zoogeny
Google making Gemini 2.5 Pro (Experimental) free was a big deal. I haven't tried the more expensive OpenAI models so I can't even compare, only to the free models I have used of theirs in the past.
Gemini 2.5 Pro is so much of a step up (IME) that I've become sold on Google's models in general. It not only is smarter than me on most of the subjects I engage with it, it also isn't completely obsequious. The model pushes back on me rather than contorting itself to find a way to agree.
100% of my casual AI usage is now in Gemini and I look forward to asking it questions on deep topics because it consistently provides me with insight. I am building new tools with the mind to optimize my usage to increase it's value to me.
minimaxir
One hidden note from Gemini 2.5 Flash when diving deep into the documentation: for image inputs, not only can the model be instructed to generated 2D bounding boxes of relevant subjects, but it can also create segmentation masks! https://ai.google.dev/gemini-api/docs/image-understanding#se…
At this price point with the Flash model, creating segmentation masks is pretty nifty.
The segmentation masks are a bit of a galaxy brain implementation by generating a b64 string representing the mask: https://colab.research.google.com/github/google-gemini/cookb…
I am trying to test it in AI Studio but it sometimes errors out, likely because it tries to decode the b64 lol.
simonw
I spotted something interesting in the Python API library code:
https://github.com/googleapis/python-genai/blob/473bf4b6b5a6…
That thinking_budget thing is documented, but what's the deal with include_thoughts? It sounds like it's an option to have the API return the thought summary… but I can't figure out how to get it to work, and I've not found documentation or example code that uses it.
Anyone managed to get Gemini to spit out thought summaries in its API using this option?
deanmoriarty
Genuine naive question: when it comes to Google HN has generally a negative view of it (pick any random story on Chrome, ads, search, web, working at faang, etc. and this should be obvious from the comments), yet when it comes to AI there is a somewhat notable “cheering effect” for Google to win the AI race that goes beyond a conventional appreciation of a healthy competitive landscape, which may appear as a bit of a double standard.
Why is this? Is it because OpenAI is seen as such a negative player in this ecosystem that Google “gets a pass on this one”?
And bonus question: what do people think will happen to OpenAI if Google wins the race? Do you think they’ll literally just go bust?
krembo
How is this sustainable for Google from business POV? It feels like Google is shooting itself in the foot while "winning" the AI race.. From my experience I think Google lost 99% of the ads it used to show me before in the search engine.
zenGull
[dead]
jdthedisciple
Very excited to try it, but it is noteworthy that o4-mini is strictly better according to the very benchmarks shown by Google here.
Of course it's about 4x as expensive too (I believe), but still, given the release of openai/codex as well, o4-mini will remain a strong competitor for now.
thimabi
I find it baffling that Google offers such impressive models through the API and even the free AI Studio with fine-grained control, yet the models used in the Gemini app feel much worse.
Over the past few weeks, I’ve been using Gemini Advanced on my Workspace account. There, the models think for shorter times, provide shorter outputs, and even their context window is far from the advertised 1 million tokens. It makes me think that Google is intentionally limiting the Gemini app.
Perhaps the goal is to steer users toward the API or AI Studio, with the free tier that involves data collection for training purposes.
bingdig
It appears that this impacted gemini-2.5-pro-preview-03-25 somehow? grounding with google search no longer works.
I had a workflow running that would pull news articles from the past 24 hours. It now refuses to believe the current date is 2025-04-17. Even with search turned on and I ask it what the date is it and it always replies sometime in July 2024.
Alifatisk
No matter how good the new Gemini models have become, my bad experience with early Gemini is still stuck with me and I am afraid I still suffer from confirmation bias. Whenever I just look at the Gemini app, I already assume it’s going to be a bad experience.