
Skywork-OR1: new SOTA 32B thinking model with open weight by naomiclarkson
- April 13, 2025: We release the
Skywork-OR1
(Open Reasoner 1) series of models, includingSkywork-OR1-Math-7B
,Skywork-OR1-32B-Preview
, andSkywork-OR1-7B-Preview
. We open-source- π€ Model weights:
Skywork-OR1-Math-7B
,Skywork-OR1-32B-Preview
,Skywork-OR1-7B-Preview
- π€ Training data:
Skywork-OR1-RL-Data
(Coming Soon) - π§βπ» Code:
Skywork-OR1
- We also release a Notion Blog to share detailed training recipes and extensive experimental results, analysis, and insights, dedicated to helping the community to better research, understand, and push the frontier of open reasoning models.
- π€ Model weights:
The AIME24 scores versus training steps of Skywork-OR1-Math-7B in our multi-stage training pipeline.
The Skywork-OR1
(Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. This series includes two general-purpose reasoning modelsβSkywork-OR1-7B-Preview
and Skywork-OR1-32B-Preview
βalong with a math-specialized model, Skywork-OR1-Math-7B
.
Skywork-OR1-Math-7B
is specifically optimized for mathematical reasoning, scoring 69.8 on AIME24 and 52.3 on AIME25 β well ahead of all models of similar size.Skywork-OR1-32B-Preview
delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).Skywork-OR1-7B-Preview
outperforms all similarly sized models in both math and coding scenarios.
The final release version will be available in two weeks.
We evaluate our models on AIME24, AIME25, and LiveCodeBench. Instead of using Pass@1, which is common in prior work, we introduce Avg@K as the primary metric. This metric robustly measures a model’s average performance across K independent attempts, reducing the impact of randomness and enhancing the reliability of the results. We believe that Avg@K provides a better reflection of a model’s stability and reasoning consistency.
We include the detailed results in the following table.
Model | AIME24 (Avg@32) | AIME25 (Avg@32) | LiveCodeBench (8/1/24-2/1/25) (Avg@4) |
---|---|---|---|
DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | 37.6 |
Light-R1-7B-DS | 59.1 | 44.3 | 39.5 |
DeepSeek-R1-Distill-Qwen-32B | 72.9 | 59.0 | 57.2 |
TinyR1-32B-Preview | 78.1 | 65.3 | 61.6 |
QwQ-32B | 79.5 | 65.3 | 61.6 |
DeepSeek-R1 | 79.8 | 70.0 | 65.9 |
Skywork-OR1-Math-7B | 69.8 | 52.3 | 43.6 |
Skywork-OR1-7B-Preview | 63.6 | 45.8 | 43.9 |
Skywork-OR1-32B-Preview | 79.7 | 69.0 | 63.9 |
Docker environment:
# Inside the container, install Skywork-OR1
git clone https://github.com/SkyworkAI/Skywork-OR1.git && cd Skywork-OR1 && pip3 install -e .” dir=”auto”>
docker pull whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6 # Launch the desired Docker image: docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v <image:tag> # Inside the container, install Skywork-OR1 git clone https://github.com/SkyworkAI/Skywork-OR1.git && cd Skywork-OR1 && pip3 install -e .
Conda environment:
# Installing Python 3.10 Env
9 Comments
naomiclarkson
github repo: https://github.com/SkyworkAI/Skywork-OR1
blog: https://capricious-hydrogen-41c.notion.site/Skywork-Open-Rea…
huggingface: https://huggingface.co/collections/Skywork/skywork-or1-67fa1…
scribu
From their Notion page:
> Skywork-OR1-32B-Preview delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).
Impressive, if true: much better performance than the vanilla distills of R1.
Plus itβs a fully open-source release (including data selection and training code).
byefruit
> Both of our models are trained on top of DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B.
Not to take away from their work but this shouldn't be buried at the bottom of the page – there's a gulf between completely new models and fine-tuning.
rubymamis
I tend to prefer running locally non-thinking models since they output the result significantly faster.
y2236li
Interesting β focusing on the 671B parameter model feels like a significant step. Itβs a compelling contrast to the previous models and sets a strong benchmark. Itβs great that theyβre embracing open weights and data too β thatβs a crucial aspect for innovation.
y2236li
Itβs a fascinating shift to focus on the 671B model. The contrast with the previous models β especially the distill versions β really highlights the potential of this new approach. Definitely a significant step forward, and the open-weight nature is commendable.
chvid
How is the score on AIME2024 relevant if AIME2024 has been used to train the model?
qwertox
I know one can rent consumer GPUs on the internet, where people like you and me offer their free GPU time to people who need it for a price. They basically get a GPU-enabled VM on your machine.
But is there something like a distributed network akin to SETI@home and the likes which is free for training models? Where a consensus is made on which model is trained and that any derivative works must be open source, including all the tooling and hosting platform? Would this even be possible to do, given that the latency between nodes is very high and the bandwidth limited?
iamnotagenius
[dead]