Tensor Product Attention Is All You Need by eunos

9CommentsShare PostShare on Facebook Share on XShare by EmailSend Link

Vídeo

Tensor Product Attention Is All You Need by eunos

ByHackTech January 22, 2025

9Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

[Submitted on 11 Jan 2025]

View PDF
HTML (experimental)

Abstract:Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we propose Tensor Product Attention (TPA), a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, significantly shrinki

0Likes