Skip to content Skip to footer
0 items - $0.00 0

QwQ-32B: Embracing the Power of Reinforcement Learning by nwjsmith

QwQ-32B: Embracing the Power of Reinforcement Learning by nwjsmith

19 Comments

  • Post Author
    iamronaldo
    Posted March 5, 2025 at 7:42 pm

    This is insane matching deepseek but 20x smaller?

  • Post Author
    myky22
    Posted March 5, 2025 at 8:02 pm

    No bad.

    I have tried it in a current project (Online Course) where Deepseek and Gemini have done a good job with a "stable" prompt and my impression is:
    -Somewhat simplified but original answers

    We will have to keep an eye on it

  • Post Author
    gagan2020
    Posted March 5, 2025 at 8:03 pm

    Chinese strategy is open-source software part and earn on robotics part. And, They are already ahead of everyone in that game.

    These things are pretty interesting as they are developing. What US will do to retain its power?

    BTW I am Indian and we are not even in the race as country. :(

  • Post Author
    Alex-Programs
    Posted March 5, 2025 at 8:05 pm

    This is ridiculous. 32B and beating deepseek and o1. And yet I'm trying it out and, yeah, it seems pretty intelligent…

    Remember when models this size could just about maintain a conversation?

  • Post Author
    Leary
    Posted March 5, 2025 at 8:20 pm

    To test: https://chat.qwen.ai/ and select Qwen2.5-plus, then toggle QWQ.

  • Post Author
    jaggs
    Posted March 5, 2025 at 8:30 pm

    Nice. Hard to tell whether it's really on a par with o1 or R1, but it's definitely very impressive for a 32B model.

  • Post Author
    wbakst
    Posted March 5, 2025 at 8:36 pm

    actually insane how small the model is. they are only going to get better AND smaller. wild times

  • Post Author
    bearjaws
    Posted March 5, 2025 at 8:51 pm

    Available on ollama now as well.

  • Post Author
    nycdatasci
    Posted March 5, 2025 at 9:40 pm

    Wasn't this release in Nov 2024 as a "preview" with similarly impressive performance?
    https://qwenlm.github.io/blog/qwq-32b-preview/

  • Post Author
    kelsey98765431
    Posted March 5, 2025 at 9:59 pm

    first thoughts: wow this is a real reasoning model, not just llama variant with a sft. the chain of thought actually wwill go for a very long time on a seemingly simple question like writing a pi calculation in c. very interesting.

  • Post Author
    manmal
    Posted March 5, 2025 at 10:20 pm

    I guess I won’t be needing that 512GB M3 Ultra after all.

  • Post Author
    esafak
    Posted March 5, 2025 at 10:21 pm

    Impressive output but slow. I'd still pick Claude but ask QwQ for a second opinion.

  • Post Author
    antirez
    Posted March 5, 2025 at 10:33 pm

    Note the massive context length (130k tokens). Also because it would be kinda pointless to generate a long CoT without enough context to contain it and the reply.

    EDIT: Here we are. My first prompt created a CoT so long that it catastrophically forgot the task (but I don't believe I was near 130k — using ollama with fp16 model). I asked one of my test questions with a coding question totally unrelated to what it says:

    <QwQ output>
    But the problem is in this question. Wait perhaps I'm getting ahead of
    myself.

    Wait the user hasn't actually provided a specific task yet. Let me check
    again.

    The initial instruction says:

    "Please act as an AI agent that can perform tasks… When responding,
    first output a YAML data structure with your proposed action, then wait
    for feedback before proceeding."

    But perhaps this is part of a system prompt? Wait the user input here
    seems to be just "You will be given a problem. Please reason step by
    step…" followed by a possible task?
    </QwQ>

    Note: Ollama "/show info" shows that the context size set is correct.

  • Post Author
    rvz
    Posted March 5, 2025 at 10:48 pm

    The AI race to zero continues to accelerate with downloadable free AI models which have already won the race and destroying closed source frontier AI models.

    They are once again getting squeezed in the middle and this is even before Meta releases Llama 4.

  • Post Author
    dr_dshiv
    Posted March 5, 2025 at 11:00 pm

    I love that emphasizing math learning and coding leads to general reasoning skills. Probably works the same in humans, too.

    20x smaller than Deep Seek! How small can these go? What kind of hardware can run this?

  • Post Author
    samstave
    Posted March 5, 2025 at 11:38 pm

    >>In the initial stage, we scale RL specifically for math and coding tasks. Rather than relying on traditional reward models, we utilized an accuracy verifier for math problems to ensure the correctness of final solutions and a code execution server to assess whether the generated codes successfully pass predefined test cases

    They should call this the siphon/sifter model of RL.

    You siphon only the initial domains, then sift to the solution….

  • Post Author
    daemonologist
    Posted March 5, 2025 at 11:47 pm

    It says "wait" (as in "wait, no, I should do X") so much while reasoning it's almost comical. I also ran into the "catastrophic forgetting" issue that others have reported – it sometimes loses the plot after producing a lot of reasoning tokens.

    Overall though quite impressive if you're not in a hurry.

  • Post Author
    TheArcane
    Posted March 5, 2025 at 11:49 pm

    chat.qwenlm.ai has quickly risen to the preferred choice for all my LLM needs. As accurate as Deepseek v3, but without the server issues.

    This makes it even better!

  • Post Author
    dulakian
    Posted March 5, 2025 at 11:59 pm

    My informal testing puts it just under Deepseek-R1. Very impressive for 32B. It maybe thinks a bit too much for my taste. In some of my tests the thinking tokens were 10x the size of the final answer. I am eager to test it with function calling over the weekend.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.