Show HN: Penny-1.7B Irish Penny Journal style transfer by deepsquirrelnet

Share This Article

Sed ut perspiciatis unde.

This is a style transfer from the Irish Penny Journal (1840) to Smollm2 using GRPO. For example, this response to “What is the capital of France”

Verily, in the grand tapestry of European monarchies, the city of Paris, the seat of the mighty Emperor Napoleon, holds a place of singular distinction. This city, which hath borne the name of 'La Ville Lumière' for nigh on two centuries, doth shine forth as a beacon of art, culture, and intellect, its very existence a testament to the ingenuity and brilliance of its people. And so, it is with great honour and reverence that we declare Paris, the majestic capital of the French realm, to be our noble question's answer.

Penny‑1.7B is a 1.7 billion‑parameter causal language model fine‑tuned with Group Relative Policy Optimization (GRPO) to emulate the 19ᵗʰ‑century prose of the Irish Penny Journal (1840). The RL stage ran for 6,800 policy steps, using a reward model trained to classify sentences as original IPJ vs modern translation. Maximizing this score nudges generations toward authentic Victorian‑era diction while retaining the general reasoning ability of the base SmolLM2 model.

✨ Key Facts


Base model	SmolLM2‑1.7B-Instruct
Tuning method	GRPO (RL)
Policy steps	6,800
Rewar

Post Author

ekianjo

Posted June 2, 2025 at 4:08 pm

Nice work ! It still manage to use the word 'delve' in the first sentence, which is a giveaway that it's written by a LLM.

0Likes Log in to Reply
Post Author

veggieroll

Posted June 2, 2025 at 4:23 pm

Have you written anywhere in detail on how you gathered your dataset and trained the finetune? I have a few use cases that are like this, but I'm not sure where to start.

0Likes Log in to Reply
Post Author

bee_rider

Posted June 2, 2025 at 4:34 pm

It is sort of funny that the Irish ended up being the best practitioners of the English language, despite the fact that they were forced to use it.

0Likes Log in to Reply
Post Author

_1

Posted June 2, 2025 at 4:35 pm

Kinda of strange to pick an example that is just wrong. It's supposed to be written from 1840 and says Paris is the seat of Napoleon almost 20 years after he died.

0Likes Log in to Reply
Post Author

sjkoelle

Posted June 2, 2025 at 4:39 pm

Marvelous! What gain beyond zero-shot would motivate a humble citizen to implement this instrument? How was the superiority assessed?

0Likes Log in to Reply
Post Author

kamranjon

Posted June 2, 2025 at 4:47 pm

This is really cool! Do you have any of the pipeline code available that you used for training? I am curious about how you created the reward model. I love little projects like this, thanks for sharing. I've been fine-tuning on my mac and an interested in getting into GRPO, which I haven't tried yet.

0Likes Log in to Reply
Post Author

latchkey

Posted June 2, 2025 at 4:57 pm

Reminds me of this: https://www.unix.com/man_page/debian/6/jive/

0Likes Log in to Reply
Post Author

joshstrange

Posted June 2, 2025 at 5:00 pm

Now I'm just imagining a video game with characters each having their own fine tune applied on top for their dialog. I'm guessing you could use some relatively small models. In each case you would be feeding all the context to the model (player name, current relevant quests, summary of previous interactions, etc). Though maybe fine tuning/training isn't even needed and a good enough prompt will work (Not sure what all they used for this [0]). I'm excited for the first AAA game that tries this. Anyone that has played a RPG-style game knows that after a few times going into a city (or a couple play-throughs) the dialog feels repetitive. I love the idea of Skyrim but with better dialog. You could either run the models on the user's computer or maybe just run it on the backend so you can block certain generations (wrong/misleading/"unsafe") and just ship updated dialog lists to the client occasionally.

[0] https://www.youtube.com/watch?v=d6sVWEu9HWU

0Likes Log in to Reply
Post Author

fitsumbelay

Posted June 2, 2025 at 5:35 pm

this is awesome

0Likes Log in to Reply
Post Author

KaiserPro

Posted June 2, 2025 at 5:59 pm

I'm not sure if you've tried this already, but removing the translate step might give you a more authentic output. In the journals that I saw, the language was much more simple than the output.

0Likes Log in to Reply
Post Author

throwaway314155

Posted June 2, 2025 at 6:29 pm

You mention no supervised finetuning. May I ask why? I'm curious if you could get similar/better/worse results by just finetuning the LLM on your dataset rather than generating synthetic data, training a classifier and using GRPO?

Cool stuff in any case.

0Likes Log in to Reply
Post Author

npunt

Posted June 2, 2025 at 7:33 pm

Love it. Immediately reminded of the text filters back in the day like the pirate one that would drop letters and replace with apostrophes and change certain passages into "arr" or "yarr matey"

0Likes Log in to Reply

Show HN: Penny-1.7B Irish Penny Journal style transfer by deepsquirrelnet

Show HN: Penny-1.7B Irish Penny Journal style transfer by deepsquirrelnet

Share This Article

Newsletter

✨ Key Facts

HackTech

12 Comments

ekianjo

veggieroll

bee_rider

_1

sjkoelle

kamranjon

latchkey

joshstrange

fitsumbelay

KaiserPro

throwaway314155

npunt

Leave a comment Cancel reply

Editor's Choice

Show HN: Penny-1.7B Irish Penny Journal style transfer by deepsquirrelnet

Show HN: Penny-1.7B Irish Penny Journal style transfer by deepsquirrelnet

Share This Article

Newsletter

✨ Key Facts

12 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter