Think Before You Speak: Training Language Models with Pause Tokens by og_kalu

8CommentsShare PostShare on Facebook Share on XShare by EmailSend Link

Top

Think Before You Speak: Training Language Models with Pause Tokens by og_kalu

ByHackTech October 4, 2023

8Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

[Submitted on 3 Oct 2023]

Download PDF

Abstract:Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing trai

0Likes