Retrieval Augmented Generation (Rag) on Audio Data with LangChain by SleekEagle

Share This Article

Sed ut perspiciatis unde.

Retrieval Augmented Generation (RAG) is a method to augment the relevance and transparency of Large Language Model (LLM) responses. In this approach, the LLM query retrieves relevant documents from a database and passes these into the LLM as additional context. RAG therefore helps improve the relevancy of responses by including pertinent information in the context, and also improves transparency by letting the LLM reference and cite source documents.

While RAG is an increasingly popular method, it requires textual documents as references and cannot directly use the wealth of audio data that many organizations have or that is available online, like meeting recordings, lectures, webinars, and more.

In this tutorial, we will demonstrate how to perform Retrieval Augmented Generation with audio data by leveraging AssemblyAI’s document loader for LangChain, a popular framework that provides building blocks for LLM-based applications.

The source code for this tutorial can be found in this repo.

Getting started

To follow this tutorial, you’ll need an AssemblyAI API key. You can get one for free here if you don’t already have one. Additionally, we’ll be using GPT 3.5 for this tutorial, so you’ll need an OpenAI API key as well – sign up here if you don’t have one already.

Setting up the virtual environment

In a terminal, create a directory for this project and navigate into it:

mkdir ragaudio && cd ragaudio

Now, enter the following command to create a virtual environment called venv

python -m venv venv

Next, activate the environment. If you’re on MacOS/Linux, enter

source ./venv/bin/activate

If you are on Windows, enter

.venvScriptsactivate.bat

Next, install the libraries we’ll need for this tutorial:

pip install assemblyai langchain openai python-dotenv chromadb sentence-transformers

Setting up the environment file

We’ll use python-dotenv to load environment variables for our project. In order to do so, we’ll need to store our environment variables in a file that the package can read. In your project directory, create a file called .env and paste your API keys as the values for the corresponding environment variables:

ASSEMBLYAI_API_KEY=your-key-here
OPENAI_API_KEY=your-key-here

Note, it is extremely important to not share this file or check it into source control. Anybody who has access to these keys can use your respective accounts, so it is a good idea to create a .gitignore file to make sure that this does not accidentally happen. Additionally, we can exclude the virtual environment from source control. The .gitignore file should therefore contain the following lines:

.env
venv

Writing the application

Now we’re ready to write the application. Overall, the application will work as follows:

First, we will load our documents using AssemblyAI’s document loader for LangChain
Next, we will split these documents in order to have smaller chunks that can be retrieved by LangChain during RAG
We will then embed these chunks with HuggingFace, yielding one vector for each chunk
After that, we will store these embeddings in a Chroma vector database
Finally, we will use LangChain’s built in QA chain to write a simple loop that lets us query OpenAI’s GPT 3.5. We will be shown the answer along with the texts retrieved for RAG when generating the answer

We can see this process pictorially as below:

Imports and environment variables

Open a new file in your project directory called main.py. First, we’ll add all imports required for this project. Additionally, we’ll use the load_dotenv() function in order to load the contents of our .env file as environment variables.

from dotenv import load_dotenv
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import AssemblyAIAudioTranscriptLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma


load_dotenv()

Loading the documents

The AssemblyAIAudioTranscriptLoader allows us to load audio files, local or remote, into LangChain applications. In this example, we will use several audio files from LangChain’s webinar series on YouTube, but feel free

Retrieval Augmented Generation (Rag) on Audio Data with LangChain by SleekEagle

Retrieval Augmented Generation (Rag) on Audio Data with LangChain by SleekEagle

Share This Article

Newsletter

Getting started

Setting up the virtual environment

Setting up the environment file

Writing the application

Imports and environment variables

Loading the documents

HackTech

Leave a comment Cancel reply

Editor's Choice

Retrieval Augmented Generation (Rag) on Audio Data with LangChain by SleekEagle

Retrieval Augmented Generation (Rag) on Audio Data with LangChain by SleekEagle

Share This Article

Newsletter

Getting started

Setting up the virtual environment

Setting up the environment file

Writing the application

Imports and environment variables

Loading the documents

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter