[Submitted on 2 May 2023]
Abstract: Transformer-based models typically have a predefined bound to their input
length, because of their need to potentially attend to every token in the
input. In this work, we propose Unlimiformer: a general approach that can wrap
any existing pretrained encoder-decoder transformer, and offload the attention
comp