I am originally a network engineer. Over the last 10 years, I have mainly focused on cloud networking, gaining hands-on experience with Kubernetes and Istio. I should note that AI is not my area of expertise. It wasn’t until the release of ChatGPT at the end of 2022 that I began to delve into the topic. I would like to thank Sofia Ferreira and Piero Savastano for for helping me get started on this new adventure.
I didn’t want to write a lot of code, so it took me some time to find an interesting entry point to gain hands-on experience with OpenAI. Browsing social media platforms, I found the Cheshire Cat project by Piero Savastano.
Citing the official project documentation:
The Cheshire Cat is an open-source framework that allows you to develop intelligent agents on top of many Large Language Models (LMM). You can develop your custom AI architecture to assist you in a wide range of tasks.
Once I gained access to OpenAI through my Azure subscription, I was eager to start experimenting with the product. However, while the Cheshire Cat implementation supported OpenAI from openai.com
, I couldn’t use it to connect to the Azure OpenAI endpoint in my subscription. My first task was to propose a pull request (PR) to implement this feature. It involved some testing and a patch to the LangChain Python library, but I was able to get everything up and running in just a few days.
Just as a System Engineer with a Linux VM and an Apache Web Server can install WordPress to start a blog and even create a WordPress plugin using the existing framework, a Cloud Engineer with an Azure Subscription can use the Cheshire Cat tool to develop an AI agent. Cheshire Cat provides the necessary framework and codebase to simplify the development process, much like WordPress does for creating a blog or plugin.
While writing the patch for the Cheshire Cat integration with Azure OpenAI, I learned several things, including:
- The Azure OpenAI resource offers an API endpoint that allows users to connect to Model Deployments.
- There are models that provide a Completion API that generates natural languages responses to a prompt. The Completion API takes a prompt as input and generates a response that continues the prompt in a natural way, making it ideal for tasks such as chatbots, virtual assistants, and automated customer support.
- There are models that provide an Embeddings API that returns numerical representations of text. These embeddings are stored in a vector database for later use. Embeddings are dense vectors that represent the meaning of words or sentences in a high-dimensional space, they can be used for a variety of natural language processing (NLP) tasks, such as text classification, clustering, and similarity comparison.
The Cheshire Cat provides a development environment based on docker-compose
consisting of three containers:
- core: the Cheshire Cat core microservice that exposes a REST API implemented with FastAPI
- admin: a Node.js web interface for testing and configuration
- qdrant: a open-source Vector Database
While the core and admin containers are intended for development and are not immutable, they do have the necessary Linux environment to run the Python code mounted as a Docker volume.
To run on Azure I decided to write my own immutable containers and I published the project kube-cheshire-cat.
The Microsoft documentation has detailed instructions to create a resource and deploy a model using Azure OpenAI.
To begin, execute the following commands, which are the bare minimum required.
Create a resource group:
az group create --name cheshire-cat --location eastus
Deploy an Azure OpenAI resource
az cognitiveservices account create
--name cheshire-cat
--resource-group cheshire-cat
--location eastus
--kind OpenAI
--sku s0
Create a deployment for the model used for completions. I am going to use the gpt-35-turbo
model in this example. For simplicity I will use the string gpt-35-turbo
also for the deployment name.
az cognitiveservices account deployment create
--name cheshire-cat
--resource-group cheshire-cat
--deployment-name gpt-35-turbo
--model-name gpt-35-turbo
--model-version "0301"
--model-format OpenAI
--scale-settings-scale-type "Standard"
Create a deployment for