Mistral Small 3 by jasondavies

ByHackTech January 30, 2025

27Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

Today we’re introducing Mistral Small 3, a latency-optimized 24B-parameter model released under the Apache 2.0 license.

Mistral Small 3 is competitive with larger models such as Llama 3.3 70B or Qwen 32B, and is an excellent open replacement for opaque proprietary models like GPT4o-mini. Mistral Small 3 is on par with Llama 3.3 70B instruct, while being more than 3x faster on the same hardware.

Mistral Small 3 is a pre-trained and instructed model catered to the ‘80%’ of generative AI tasks—those that require robust language and instruction following performance, with very low latency.

We designed this new model to saturate performance at a size suitable for local deployment. Particularly, Mistral Small 3 has far fewer layers than competing models, substantially reducing the time per forward pass. At over 81% accuracy on MMLU and 150 tokens/s latency, Mistral Small is currently the most efficient model of its category.

We’re releasing both a pretrained and instruction-tuned checkpoint under Apache 2.0. The checkpoints can serve as a powerful base for accelerating progress. Note that Mistral Small 3 is neither trained with RL nor synthetic data, so is earlier in the model production pipeline than models like Deepseek R1 (a great and complementary piece of open-source technology!). It can serve as a great base model for building accrued reasoning capacities. We look forward to seeing how the open-source community adopts and customizes it.

Performance

Human Evaluations

We conducted side by side evaluations with an external third-party vendor, on a set of over 1k proprietary coding and generalist prompts. Evaluators were tasked with selecting their preferred model response from anonymized generations produced by Mistral Small 3 vs another model. We are aware that in some cases the benchmarks on human judgement starkly differ from publicly available benchmarks, but have taken extra caution in verifying a fair evaluation. We are confident that the above benchmarks are valid.

Instruct performance

Our instruction tuned model performs c

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (27)

27 Comments

Post Author

rvz

Posted January 30, 2025 at 2:25 pm

The AI race to zero continues to accelerate and Mistral has shown one card to just stay in the race. (And released for free)

OpenAI's reaction to DeepSeek looked more like cope and panic after they realized they're getting squeezed at their own game.

Notice how Google hasn't said anything with these announcements and didn't rush out a model nor did they do any price cuts? They are not in panic and have something up their sleeve.

I'd expect Google to release a new reasoning model that is competitive with DeepSeek and o1 (or matches o3). Would be even more interesting if they release it for free.

0Likes Log in to Reply
Post Author

fuegoio

Posted January 30, 2025 at 2:32 pm

Finally something from them

0Likes Log in to Reply
Post Author

timestretch

Posted January 30, 2025 at 2:38 pm

Their models have been great, but I wish they'd include the number of parameters in the model name, like every other model.

0Likes Log in to Reply
Post Author

cptcobalt

Posted January 30, 2025 at 2:43 pm

This is really exciting—the 12-32b size range has my favorite model size on my home computer, and the mistrals have been historically great and embraced for various fine-tuning.

At 24b, I think this has a good chance of fitting on my more memory constrained work computer.

0Likes Log in to Reply
Post Author

msp26

Posted January 30, 2025 at 2:53 pm

Finally, all the recent MoE model releases make me depressed with my mere 24GB VRAM.

> Note that Mistral Small 3 is neither trained with RL nor synthetic data

Not using synthetic data at all is a little strange

0Likes Log in to Reply
Post Author

asb

Posted January 30, 2025 at 2:54 pm

Note the announcement at the end, that they're moving away from the non-commercial only license used in some of their models in favour of Apache:

We’re renewing our commitment to using Apache 2.0 license for our general purpose models, as we progressively move away from MRL-licensed models

0Likes Log in to Reply
Post Author

fvv

Posted January 30, 2025 at 3:07 pm

given new USA ai diffusion rules will mistral be able to survive and attract new capitals ? , I mean, given that france is top tier country

0Likes Log in to Reply
Post Author

netdur

Posted January 30, 2025 at 3:20 pm

seems on par or better than gpt4 mini

0Likes Log in to Reply
Post Author

bugglebeetle

Posted January 30, 2025 at 3:27 pm

Interested to see what folks do with putting DeepSeek-style RL methods on top of this. The smaller Mistral models have always punched above their weight and been the best for fine-tuning.

0Likes Log in to Reply
Post Author

Terretta

Posted January 30, 2025 at 3:36 pm

"When quantized, Mistral Small 3 can be run privately on a single RTX 4090 or a Macbook with 32GB RAM."

0Likes Log in to Reply
Post Author

yodsanklai

Posted January 30, 2025 at 3:39 pm

I'm curious, what people do with these smaller models?

0Likes Log in to Reply
Post Author

resource_waste

Posted January 30, 2025 at 3:40 pm

Curious how it actually compares to LLaMa.

Last year Mistral was garbage compared to LLaMa. I needed a permissive license, so I was forced to use Mistral, but I had LLaMa that I could compare it to. I was always extremely jealous of LLaMa since the Berkley Sterling finetune was so amazing.

I ended up giving up on the project because Mistral was so unusable.

My conspiracy was that there was some European patriotism that gave Mistral a bit more hype than was merited.

0Likes Log in to Reply
Post Author

unraveller

Posted January 30, 2025 at 4:01 pm

What's this stuff about the model catering to ‘80%’ of generative AI tasks? What model do they expect me to use for the other 20% of the time when my question needs reasoning smarts.

0Likes Log in to Reply
Post Author

GaggiX

Posted January 30, 2025 at 4:03 pm

Hopefully they will finetuning it using RL like DeepSeek did, it would be great to have more open reasoning models.

0Likes Log in to Reply
Post Author

simonw

Posted January 30, 2025 at 4:09 pm
I'm excited about this one – they seem to be directly targeting the "best model to run on a decent laptop" category, hence the comparison with Llama 3.3 70B and Qwen 2.5 32B.

I'm running it on a M2 64GB MacBook Pro now via Ollama and it's fast and appears to be very capable. This downloads 14GB of model weights:

ollama run mistral-small:24b

Then using my https://llm.datasette.io/ tool (so I can log my prompts to SQLite):

llm install llm-ollama llm -m mistral-small:24b "say hi"

More notes here: https://simonwillison.net/2025/Jan/30/mistral-small-3/
0Likes Log in to Reply
Post Author

butz

Posted January 30, 2025 at 4:19 pm

Is there a gguf version that could be used with llamafile?

0Likes Log in to Reply
Post Author

strobe

Posted January 30, 2025 at 4:29 pm

not sure how much worse it than original but mistral-small:22b-instruct-2409-q2_K seems works on 16GB VRAM GPU

0Likes Log in to Reply
Post Author

Havoc

Posted January 30, 2025 at 4:37 pm

How does that fit into a 4090? The files on the repo look way too large. Do they mean a quant?

0Likes Log in to Reply
Post Author

m3kw9

Posted January 30, 2025 at 4:40 pm

Sorry to dampen the news but 4o-mini level isn’t really a useful model other than talk to me for fun type of applications.

0Likes Log in to Reply
Post Author

rcarmo

Posted January 30, 2025 at 5:26 pm

There's also a 22b model that I appreciate, since it _almost_ fits into my 12GB 3060. But, alas, I might need to get a new GPU if this trend of fatter smaller models continues.

0Likes Log in to Reply
Post Author

mohsen1

Posted January 30, 2025 at 5:37 pm
Not so subtle in function calling example[1]

"role": "assistant", "content": "---nnOpenAI is a FOR-profit company.",

[1] https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-…
0Likes Log in to Reply
Post Author

picografix

Posted January 30, 2025 at 5:38 pm

Tried running locally, gone were the days where you get broken responses on local models (i know this happened earlier but I tried after so many days)

0Likes Log in to Reply
Post Author

freehorse

Posted January 30, 2025 at 5:52 pm

I tried just a few of the code generating prompts I have used last days, and it looks quite good and promising. It seems at least on par with qwen2.5-coder-32b which was the first local model i would actually use for code. I am also surprised how far we went with small models producing such more polished output in the last year.

On another note, I also wish they would follow up with a new version of the 8x7B mixtral. It was one of my favourite models, but at the time it could barely fit in my ram, and now that I have more ram it is rather outdated. But I don't complain, this model anyway is great and it is great that they are one of the companies which actually publish such models targeted to edge computing.

0Likes Log in to Reply
Post Author

spwa4

Posted January 30, 2025 at 6:12 pm

So the point of this release is

1) code + weights Apache 2.0 licensed (enough to run locally, enough to train, not enough to reproduce this version)

2) Low latency, meaning 11ms per token (so ~90 tokens/sec on 4xH100)

3) Performance, according to mistral, somewhere between Qwen 2.5 32B and Llama 3.3 70B, roughly equal with GPT4o-mini

4) ollama run mistral-small (14G download) 9 tokens/sec on the question "who is the president of the US?" (also to enjoy that the answer ISN'T orange idiot)

0Likes Log in to Reply
Post Author

mrbonner

Posted January 30, 2025 at 7:04 pm

Is there a chance for me to get a eGPU (external GPU dock) for my M1 16GB laptop to plunge thru this model?

0Likes Log in to Reply
Post Author

rahimnathwani

Posted January 30, 2025 at 8:37 pm

Until today, no language model I've run locally on a 32GB M1 has been able to answer this question correctly: "What was Mary J Blige's first album?"

Today, a 4-bit quantized version of Mistral Small (14GB model size) answered correctly :)

https://ollama.com/library/mistral-small:24b-instruct-2501-q…

0Likes Log in to Reply
Post Author

tadamcz

Posted January 30, 2025 at 8:59 pm

Hi! I'm Tom, a machine learning engineer at the nonprofit research institute Epoch AI [0]. I've been working on building infrastructure to:

* run LLM evaluations systematically and at scale

* share the data with the public in a rigorous and transparent way

We use the UK government's Inspect [1] library to run the evaluations.

As soon as I saw this news on HN, I evaluated Mistral Small 3 on MATH [2] level 5 (hardest subset, 1,324 questions). I get an accuracy of 0.45 (± 0.011). We sample the LLM 8 times for each question, which lets us obtain less noisy estimates of mean accuracy, and measure the consistency of the LLM's answers. The 1,324*8=10,584 samples represent 8.5M tokens (2M in, 6.5M out).

You can see the full transcripts here in Inspect’s interactive interface: https://epoch.ai/inspect-viewer/484131e0/viewer?log_file=htt…

Note that MATH is a different benchmark from the MathInstruct [3] mentioned in the OP.

It's still early days for Epoch AI's benchmarking work. I'm developing a systematic database of evaluations run directly by us (so we can share the full details transparently), which we hope to release very soon.

[0]: https://epoch.ai/

[1]: https://github.com/UKGovernmentBEIS/inspect_ai

[2]: https://arxiv.org/abs/2103.03874

[3]: https://huggingface.co/datasets/TIGER-Lab/MathInstruct

0Likes Log in to Reply

Mistral Small 3 by jasondavies

Mistral Small 3 by jasondavies

Share This Article

Newsletter

Performance

Human Evaluations

Instruct performance

HackTech

27 Comments

rvz

fuegoio

timestretch

cptcobalt

msp26

asb

fvv

netdur

bugglebeetle

Terretta

yodsanklai

resource_waste

unraveller

GaggiX

simonw

butz

strobe

Havoc

m3kw9

rcarmo

mohsen1

picografix

freehorse

spwa4

mrbonner

rahimnathwani

tadamcz

Leave a comment Cancel reply

Editor's Choice

Mistral Small 3 by jasondavies

Mistral Small 3 by jasondavies

Share This Article

Newsletter

Performance

Human Evaluations

Instruct performance

27 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter