[Submitted on 3 May 2023]
Abstract: Deploying large language models (LLMs) is challenging because they are memory
inefficient and compute-intensive for practical applications. In reaction,
researchers train smaller task-specific models by either finetuning with human
labels or distilling using LLM-generated labels. However, finetuning and
distil