Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs by doener

Share This Article

Sed ut perspiciatis unde.

[Submitted on 30 Sep 2024 (v1), last revised 15 Oct 2024 (this version, v2)]

Authors:Mehdi Ali, Michael Fromm, Klaudia Thellmann, Jan Ebert, Alexander Arno Weber, Richard Rutmann, Charvi Jain, Max Lübbering, Daniel Steinigen, Johannes Leveling, Katrin Klug, Jasper Schulze Buschhoff, Lena Jurkschat, Hammam Abdelwahab, Benny Jörg Stein, Karl-Heinz Sylla, Pavel Denisov, Nicolo’ Brandizzi, Qasid Saleem, Anirban Bhowmick, Lennard Helmer, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Alex Jude, Lalith Manjunath, Samuel Weinbach, Carolin Penke, Oleg Filatov, Shima Asaadi, Fabio Barth, Rafet Sifa, Fabian Küch, Andreas Herten, René Jäkel, Georg Rehm, Stefan Kesselheim, Joachim Köhler, Nicolas Flores-Herr

View PDF
HTML (experimental)

Abstract:We present two multilingual LLMs designed to embrace Europe’s linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-English data and utilizing a custom multilingual tokenizer, our models address the limitations of existing LLMs that predominantly focus on English or a few high-resource languages. We detail the models’ development principles, i.e., data composition, tokenizer optimization, and training methodologies. The models demonstrate competitive performance across multilingual benchmarks, as evidenced by their performance on European versions of ARC, HellaSwag

Post Author

smokel

Posted April 15, 2025 at 12:05 pm

A paper on languages that begins with a grammatical error in the first sentence does not inspire confidence:

> LLMs represents a disruptive technology

0Likes Log in to Reply
Post Author

JKolios

Posted April 15, 2025 at 12:06 pm

More diversity in the LLM space is always good. In my experience though, speaking as a native speaker of one of the less-used European languages, Mistral's models already use it pretty well.

0Likes Log in to Reply
Post Author

kiru_io

Posted April 15, 2025 at 12:06 pm

Maybe someone should edit the title to mention this is from 2024: [Submitted on 30 Sep 2024 (v1), last revised 15 Oct 2024 (this version, v2)]

0Likes Log in to Reply
Post Author

ozgune

Posted April 15, 2025 at 12:12 pm

I had a related, but orthogonal question about multilingual LLMs.

When I ask smaller models a question in English, the model does well. When I ask the same model a question in Turkish, the answer is mediocre. When I ask the model to translate my question into English, get the answer, and translate the answer back to Turkish, the model again does well.

For example, I tried the above with Llama 3.3 70B, and asked it to plan me a 3-day trip to Istanbul. When I asked Llama to do the translations between English <> Turkish, the answer was notably better.

Anyone else observed a similar behavior?

0Likes Log in to Reply
Post Author

miros_love

Posted April 15, 2025 at 12:14 pm

>European versions of ARC

But this is an image-like benchmark. Has anyone looked at the article about the EU-ARC, what is the difference? Why can't you measure it on a regular one?

I glanced through it, didn't find it right away, but judging by their tokenizer, they are learning from scratch. In general, I don't like this approach for the task at hand. For large languages, there are already good models that they don't want to compare with. And for low-resource languages, it is very important to take more languages from this language group, which are not necessarily part of the EU

0Likes Log in to Reply
Post Author

YetAnotherNick

Posted April 15, 2025 at 12:15 pm

They compared with Llama 3.1 and found that to be better on average for their tasks like European MMLU. And Llama 3.1 is the worst in the batch with Qwen 2.5 and Gemma 3 being significantly better.

0Likes Log in to Reply
Post Author

KronisLV

Posted April 15, 2025 at 12:16 pm

I also quite liked the EuroLLM project: https://huggingface.co/blog/eurollm-team/eurollm-9b

Was pretty good with Latvian (better than other models this size as well as variants of Llama or Qwen that I could run) and I assume probably with other EU languages as well.

0Likes Log in to Reply
Post Author

tannhaeuser

Posted April 15, 2025 at 12:19 pm

I mean, Mistral AI is a Paris-based company, and theirs was considered on par or better than other open weight models such as llama3.1 and qwen2.5, and mistral-24b is currently beating oh-so-great gemma3-27b depending on tasks.

Also, Stable Diffusion was originally (and still is I believe) developed in Munich.

It's true though that raising capital and finding investors works wayyy better in the US (kindof needless to say on HN) and so was getting top talent – at least in the past. Don't get me started on energy prices ;) but I don't believe those contribute significantly in the end anyway.

0Likes Log in to Reply
Post Author

jug

Posted April 15, 2025 at 12:58 pm

On this topic, don’t miss the quite useful benchmark:

https://euroeval.com

0Likes Log in to Reply
Post Author

NKosmatos

Posted April 15, 2025 at 1:17 pm

There is also a Greek LLM from 2024.

Meltemi: A large foundation Language Model for the Greek language

https://huggingface.co/ilsp/Meltemi-7B-v1.5

0Likes Log in to Reply

Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs by doener

Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs by doener

Share This Article

Newsletter

HackTech

10 Comments

smokel

JKolios

kiru_io

ozgune

miros_love

YetAnotherNick

KronisLV

tannhaeuser

jug

NKosmatos

Leave a comment Cancel reply

Editor's Choice

Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs by doener

Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs by doener

Share This Article

Newsletter

10 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter