Mercury, the first commercial-scale diffusion language model by HyprMusic

ByHackTech April 30, 2025

13Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

Mercury, the first commercial-scale diffusion language model
Read More

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (13)

13 Comments

Post Author

g-mork

Posted April 30, 2025 at 10:27 pm

There are some open weight attempts at this around too: https://old.reddit.com/r/LocalLLaMA/search?q=diffusion&restr…

Saw another on Twitter past few days that looked like a better contender to Mercury, doesn't look like it got posted to LocalLLaMa, and I can't find it now. Very exciting stuff

0Likes Log in to Reply
Post Author

echelon

Posted April 30, 2025 at 10:29 pm

There are so many models. Every single day half a dozen new models land. And even more papers.

It feels like models are becoming fungible apart from the hyperscaler frontier models from OpenAI, Google, Anthropic, et al.

I suppose VCs won't be funding many more "labs"-type companies or "we have a model" as the core value prop companies? Unless it has a tight application loop or is truly unique?

Disregarding the team composition, research background, and specific problem domain – if you were starting an AI company today, what part of the stack would you focus on? Foundation models, AI/ML infra, tooling, application layer, …?

Where does the value accrue? What are the most important problems to work on?

0Likes Log in to Reply
Post Author

byearthithatius

Posted April 30, 2025 at 10:29 pm

Interesting approach. However, I never thought of auto regression being _the_ current issue with language modeling. If anything it seems the community was generally surprised just how far next "token" prediction took us. Remember back when we did char generating RNNs and were impressed they could make almost coherent sentences?

Diffusion is an alternative but I am having a hard time understanding the whole "built in error correction" that sounds like marketing BS. Both approaches replicate probability distributions which will be naturally error-prone because of variance.

0Likes Log in to Reply
Post Author

jonplackett

Posted April 30, 2025 at 10:30 pm

Ok. My go to puzzle is this:

You have 2 minutes to cool down a cup of coffee to the lowest temp you can

You have two options:

1. Add cold milk immediately, then let it sit for 2 mins.

2. Let it sit for 2 mins, then add the cold milk.

Which one cools the coffee to the lowest temperature and why?

And Mercury gets this right – while as of right now ChatGPT 4o get it wrong.

So that’s pretty impressive.

0Likes Log in to Reply
Post Author

marcyb5st

Posted April 30, 2025 at 10:33 pm

Super happy to see something like this getting traction. As someone that is trying to reduce my carbon footprint sometimes I feel bad about asking any model to do something trivial. With something like that perhaps the guilt will lessen

0Likes Log in to Reply
Post Author

inerte

Posted April 30, 2025 at 10:35 pm

Not sure if I would tradeoff speed for accuracy.

Yes, it's incredible boring to wait for the AI Agents in IDEs to finish their job. I get distracted and open YouTube. Once I gave a prompt so big and complex to Cline it spent 2 straight hours writing code.

But after these 2 hours I spent 16 more tweaking and fixing all the stuff that wasn't working. I now realize I should have done things incrementally even when I have a pretty good idea of the final picture.

I've been more and more only using the "thinking" models of o3 in ChatGPT, and Gemini / Claude in IDEs. They're slower, but usually get it right.

But at the same time I am open to the idea that speed can unlock new ways of using the tooling. It would still be awesome to basically just have a conversation with my IDE while I am manually testing the app. Or combine really fast models like this one with a "thinking background" one, that would runs for seconds/minutes but try to catch the bugs left behind.

I guess only giving a try will tell.

0Likes Log in to Reply
Post Author

parsimo2010

Posted April 30, 2025 at 10:35 pm

This sounds like a neat idea but it seems like bad timing. OpenAI just released token-based that beats the best diffusion image generation. If diffusion isn't even the best at generating images, I don't know if I'm going to spend a lot of time evaluating it for text.

Speed is great but it doesn't seem like other text-based model trends are going to work out of the box, like reasoning. So you have to get dLLMs up to the quality of a regular autoregressive LLM and then you need to innovate more to catch up to reasoning models, just to match the current state of the art. It's possible they'll get there, but I'm not optimistic.

0Likes Log in to Reply
Post Author

pants2

Posted April 30, 2025 at 10:42 pm

This is awesome for the future of autocomplete. Current models aren't fast enough to give useful suggestions at the speed that I type – but this certainly is.

That said, token-based models are currently fast enough for most real-time chat applications, so I wonder what other use-cases there will be where speed is greatly prioritized over smarts. Perhaps trading on Trump tweets?

0Likes Log in to Reply
Post Author

jakeinsdca

Posted April 30, 2025 at 11:09 pm

I just tried it and it was able to perfectly generate a piece of code for me that i needed for generating a 12 month rolling graph based on a list of invoices and it seemed a bit easier and faster then chatgpt.

0Likes Log in to Reply
Post Author

jph00

Posted April 30, 2025 at 11:10 pm

The linked page only compares to very old and very small models. But the pricing is higher even than the latest Gemini Flash 2.5 model, which performs far better than anything they compare to.

0Likes Log in to Reply
Post Author

dmos62

Posted April 30, 2025 at 11:31 pm

If the benchmarks aren't lying, Mercury Coder Small is as smart as 4o mini and costs the same, but is order of magnitude faster when outputting (unclear if pre-output delay is notably different). Pretty cool. However, I'm under the impression that 4o-mini was superceded by 4.1-mini and 4.1-nano for all use cases (correct me if I'm wrong). Unfortunately they didn't publish comparisons with the 4.1 line, which feels like an attempt to manipulate the optics. Or am I misreading this?

Btw, why call it "coder"? 4o-mini level of intelligence is for extracting structured data and basic summaries, definitely not for coding.

0Likes Log in to Reply
Post Author

jtonz

Posted April 30, 2025 at 11:43 pm

I would be interested to see how people would apply this working as a coding assistant. For me, its application in solutioning seem very strong, particularly vibe coding, and potentially agentic coding. One of my main gripes with LLM-assisted coding is that for me to get the output which catches all scenarios I envision takes multiple attempts in refining my prompt requiring regeneration of the output. Iterations are slow and often painful.

With the speed this can generate its solutions, you could have it loop through attempting the solution, feeding itself the output (including any errors found), and going again until it builds the "correct" solution.

0Likes Log in to Reply
Post Author

schappim

Posted April 30, 2025 at 11:43 pm

It's nice to see a team doing something different.

The cost[1] is US$1.00 per million output tokens and US$0.25 per million input tokens. By comparison, Gemini 2.5 Flash Preview charges US$0.15 per million tokens for text input and $0.60 (non-thinking) output.

Hmmm… at those prices they need to focus on markets where speed is especially important, eg high-frequency trading, transcription/translation services and hardware/IoT alerting!

1. https://files.littlebird.com.au/Screenshot-2025-05-01-at-9.3…

2. https://files.littlebird.com.au/pb-IQYUdv6nQo.png

0Likes Log in to Reply

Mercury, the first commercial-scale diffusion language model by HyprMusic

Mercury, the first commercial-scale diffusion language model by HyprMusic

Share This Article

Newsletter

HackTech

13 Comments

g-mork

echelon

byearthithatius

jonplackett

marcyb5st

inerte

parsimo2010

pants2

jakeinsdca

jph00

dmos62

jtonz

schappim

Leave a comment Cancel reply

Editor's Choice

Mercury, the first commercial-scale diffusion language model by HyprMusic

Mercury, the first commercial-scale diffusion language model by HyprMusic

Share This Article

Newsletter

13 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter