Skip to content Skip to footer
0 items - $0.00 0

Mercury, the first commercial-scale diffusion language model by HyprMusic

Mercury, the first commercial-scale diffusion language model by HyprMusic

13 Comments

  • Post Author
    g-mork
    Posted April 30, 2025 at 10:27 pm

    There are some open weight attempts at this around too: https://old.reddit.com/r/LocalLLaMA/search?q=diffusion&restr…

    Saw another on Twitter past few days that looked like a better contender to Mercury, doesn't look like it got posted to LocalLLaMa, and I can't find it now. Very exciting stuff

  • Post Author
    echelon
    Posted April 30, 2025 at 10:29 pm

    There are so many models. Every single day half a dozen new models land. And even more papers.

    It feels like models are becoming fungible apart from the hyperscaler frontier models from OpenAI, Google, Anthropic, et al.

    I suppose VCs won't be funding many more "labs"-type companies or "we have a model" as the core value prop companies? Unless it has a tight application loop or is truly unique?

    Disregarding the team composition, research background, and specific problem domain – if you were starting an AI company today, what part of the stack would you focus on? Foundation models, AI/ML infra, tooling, application layer, …?

    Where does the value accrue? What are the most important problems to work on?

  • Post Author
    byearthithatius
    Posted April 30, 2025 at 10:29 pm

    Interesting approach. However, I never thought of auto regression being _the_ current issue with language modeling. If anything it seems the community was generally surprised just how far next "token" prediction took us. Remember back when we did char generating RNNs and were impressed they could make almost coherent sentences?

    Diffusion is an alternative but I am having a hard time understanding the whole "built in error correction" that sounds like marketing BS. Both approaches replicate probability distributions which will be naturally error-prone because of variance.

  • Post Author
    jonplackett
    Posted April 30, 2025 at 10:30 pm

    Ok. My go to puzzle is this:

    You have 2 minutes to cool down a cup of coffee to the lowest temp you can

    You have two options:

    1. Add cold milk immediately, then let it sit for 2 mins.

    2. Let it sit for 2 mins, then add the cold milk.

    Which one cools the coffee to the lowest temperature and why?

    And Mercury gets this right – while as of right now ChatGPT 4o get it wrong.

    So that’s pretty impressive.

  • Post Author
    marcyb5st
    Posted April 30, 2025 at 10:33 pm

    Super happy to see something like this getting traction. As someone that is trying to reduce my carbon footprint sometimes I feel bad about asking any model to do something trivial. With something like that perhaps the guilt will lessen

  • Post Author
    inerte
    Posted April 30, 2025 at 10:35 pm

    Not sure if I would tradeoff speed for accuracy.

    Yes, it's incredible boring to wait for the AI Agents in IDEs to finish their job. I get distracted and open YouTube. Once I gave a prompt so big and complex to Cline it spent 2 straight hours writing code.

    But after these 2 hours I spent 16 more tweaking and fixing all the stuff that wasn't working. I now realize I should have done things incrementally even when I have a pretty good idea of the final picture.

    I've been more and more only using the "thinking" models of o3 in ChatGPT, and Gemini / Claude in IDEs. They're slower, but usually get it right.

    But at the same time I am open to the idea that speed can unlock new ways of using the tooling. It would still be awesome to basically just have a conversation with my IDE while I am manually testing the app. Or combine really fast models like this one with a "thinking background" one, that would runs for seconds/minutes but try to catch the bugs left behind.

    I guess only giving a try will tell.

  • Post Author
    parsimo2010
    Posted April 30, 2025 at 10:35 pm

    This sounds like a neat idea but it seems like bad timing. OpenAI just released token-based that beats the best diffusion image generation. If diffusion isn't even the best at generating images, I don't know if I'm going to spend a lot of time evaluating it for text.

    Speed is great but it doesn't seem like other text-based model trends are going to work out of the box, like reasoning. So you have to get dLLMs up to the quality of a regular autoregressive LLM and then you need to innovate more to catch up to reasoning models, just to match the current state of the art. It's possible they'll get there, but I'm not optimistic.

  • Post Author
    pants2
    Posted April 30, 2025 at 10:42 pm

    This is awesome for the future of autocomplete. Current models aren't fast enough to give useful suggestions at the speed that I type – but this certainly is.

    That said, token-based models are currently fast enough for most real-time chat applications, so I wonder what other use-cases there will be where speed is greatly prioritized over smarts. Perhaps trading on Trump tweets?

  • Post Author
    jakeinsdca
    Posted April 30, 2025 at 11:09 pm

    I just tried it and it was able to perfectly generate a piece of code for me that i needed for generating a 12 month rolling graph based on a list of invoices and it seemed a bit easier and faster then chatgpt.

  • Post Author
    jph00
    Posted April 30, 2025 at 11:10 pm

    The linked page only compares to very old and very small models. But the pricing is higher even than the latest Gemini Flash 2.5 model, which performs far better than anything they compare to.

  • Post Author
    dmos62
    Posted April 30, 2025 at 11:31 pm

    If the benchmarks aren't lying, Mercury Coder Small is as smart as 4o mini and costs the same, but is order of magnitude faster when outputting (unclear if pre-output delay is notably different). Pretty cool. However, I'm under the impression that 4o-mini was superceded by 4.1-mini and 4.1-nano for all use cases (correct me if I'm wrong). Unfortunately they didn't publish comparisons with the 4.1 line, which feels like an attempt to manipulate the optics. Or am I misreading this?

    Btw, why call it "coder"? 4o-mini level of intelligence is for extracting structured data and basic summaries, definitely not for coding.

  • Post Author
    jtonz
    Posted April 30, 2025 at 11:43 pm

    I would be interested to see how people would apply this working as a coding assistant. For me, its application in solutioning seem very strong, particularly vibe coding, and potentially agentic coding. One of my main gripes with LLM-assisted coding is that for me to get the output which catches all scenarios I envision takes multiple attempts in refining my prompt requiring regeneration of the output. Iterations are slow and often painful.

    With the speed this can generate its solutions, you could have it loop through attempting the solution, feeding itself the output (including any errors found), and going again until it builds the "correct" solution.

  • Post Author
    schappim
    Posted April 30, 2025 at 11:43 pm

    It's nice to see a team doing something different.

    The cost[1] is US$1.00 per million output tokens and US$0.25 per million input tokens. By comparison, Gemini 2.5 Flash Preview charges US$0.15 per million tokens for text input and $0.60 (non-thinking) output.

    Hmmm… at those prices they need to focus on markets where speed is especially important, eg high-frequency trading, transcription/translation services and hardware/IoT alerting!

    1. https://files.littlebird.com.au/Screenshot-2025-05-01-at-9.3…

    2. https://files.littlebird.com.au/pb-IQYUdv6nQo.png

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.