Skip to content Skip to footer
0 items - $0.00 0

Skywork-OR1: new SOTA 32B thinking model with open weight by naomiclarkson

Skywork-OR1: new SOTA 32B thinking model with open weight by naomiclarkson

Skywork-OR1: new SOTA 32B thinking model with open weight by naomiclarkson

9 Comments

  • Post Author
    scribu
    Posted April 13, 2025 at 4:22 pm

    From their Notion page:

    > Skywork-OR1-32B-Preview delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).

    Impressive, if true: much better performance than the vanilla distills of R1.

    Plus it’s a fully open-source release (including data selection and training code).

  • Post Author
    byefruit
    Posted April 13, 2025 at 4:38 pm

    > Both of our models are trained on top of DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B.

    Not to take away from their work but this shouldn't be buried at the bottom of the page – there's a gulf between completely new models and fine-tuning.

  • Post Author
    rubymamis
    Posted April 13, 2025 at 4:42 pm

    I tend to prefer running locally non-thinking models since they output the result significantly faster.

  • Post Author
    y2236li
    Posted April 13, 2025 at 5:19 pm

    Interesting – focusing on the 671B parameter model feels like a significant step. It’s a compelling contrast to the previous models and sets a strong benchmark. It’s great that they’re embracing open weights and data too – that’s a crucial aspect for innovation.

  • Post Author
    y2236li
    Posted April 13, 2025 at 5:27 pm

    It’s a fascinating shift to focus on the 671B model. The contrast with the previous models – especially the distill versions – really highlights the potential of this new approach. Definitely a significant step forward, and the open-weight nature is commendable.

  • Post Author
    chvid
    Posted April 13, 2025 at 6:21 pm

    How is the score on AIME2024 relevant if AIME2024 has been used to train the model?

  • Post Author
    qwertox
    Posted April 13, 2025 at 6:37 pm

    I know one can rent consumer GPUs on the internet, where people like you and me offer their free GPU time to people who need it for a price. They basically get a GPU-enabled VM on your machine.

    But is there something like a distributed network akin to SETI@home and the likes which is free for training models? Where a consensus is made on which model is trained and that any derivative works must be open source, including all the tooling and hosting platform? Would this even be possible to do, given that the latency between nodes is very high and the bandwidth limited?

  • Post Author
    iamnotagenius
    Posted April 13, 2025 at 6:54 pm

    [dead]

Leave a comment

In the Shadows of Innovation”

Β© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.