In this blog post, you will learn how one could benefit from OpenAI’s chat plugin ecosystem to enhance the capabilities of the state-of-the-art chat completion models when tackling a real-world Postgres and Citus use case: choosing the right shard key (also known as the distribution column).
If you are already using the Citus database extension in Postgres, or Azure Cosmos DB for PostgreSQL, you probably know that shard keys are used in distributed tables to assign table rows to shards.
Choosing the right distribution strategy for your Citus database—such as choosing the right shard key in your PostgreSQL distributed tables, or choosing whether a table should be picked as a reference table—is one of the most important data modeling decisions (and, pain points). Even though you have access to tips for selecting a Citus sharding key for the most common distributed PostgreSQL scenarios, you may still need use-case- or workload-specific help.
In the following sections, you will learn more about:
- LLMs & Chat Completion Models—and Limitations
- 3 Strategies for Making GPT Context Aware
- Exploring How a ChatGPT Plugin Could Help Select a Citus Shard Key
- Extending the Framework to Other Use Cases
Figure 1: Depiction of the use of OpenAI models, with plugins, to solve a real-world Postgres and Citus problem: the shard key selection.
LLMs and Chat Completion Models and Their Limitations
Chat completion models, such as GPT-3 and its successors GPT-3.5 and GPT-4, are a sub-class of Large Language Models (LLMs) and have revolutionized the field of natural language processing. These models are trained on vast amounts of textual data, which enables the models to understand and generate not only natural language but also code based on the input they receive.
These models can be used in a wide range of applications… from chatbots and virtual assistants to content creation and language translation (you can read more about our customer stories). The models’ underlying technology, i.e., transformers, provide complex mechanisms for the models to understand and generate text based on context.
Despite their impressive capabilities, chat completion models are not without their limitations:
- Generalist models. Because the models are trained on a diverse range of data, they may not always provide the most accurate or context-specific responses.
- Information cutoff. At the time of writing this post, the information cutoff date for the latest GPT deployments is September 2021, which means that any events, documentation, or knowledge would not be included in their knowledge base (except for the newly arriving web-browsing capabilities through the plugin system).
- Limited context length. Even the most capable model, GPT-4, comes with two variants: (1) the vanilla version with 8,192 tokens, and (2) the 32k version with 32,768 tokens. With the rule of thumb of 1 token being roughly equal to 0.75 English words, these models are limited to 6,000 and 24,000 words of interactions (including both the user’s input and the model’s response), respectively. This practically means that, beyond the given thresholds, the models lack the ability to remember past interactions or details.
3 Strategies for Making GPT Context Aware: Prompt Design, Fine-tuning, and ChatGPT Plugins
Addressing the limitations of chat completion models—in order to make GPT context aware—can be achieved through a combination of strategies: prompt design, finetuning, and ChatGPT plugins.
Prompt Design. Carefully crafting prompts may be the first approach to guiding the model to produce more accurate and contextually relevant responses (i.e., making the model a specialist). Because the chat completion models are trained to follow/extract patterns in natural language prompts, given context, you can (and should) design your prompts to tell the model what you expect (as patterns) in its output. Usually, this strategy includes:
- showing and telling the models what you expect,
- providing high-quality data from the context you are using the model for, and,
- playing with the model’s settings that affect the randomness of the model’s responses.
Here, you should remember that showing and telling what you expect, as well as providing data, in you prompts do count against the context length limitations of the models.
Fine-Tuning. Fine-tuning is a process where (parts of) the models are further trained on a specific dataset after their initial training. This process is specifically introduced to overcome the limitations of prompt design, i.e., to have the ability to train the models on more examples than can fit in prompts. As a result, fine-tuning results in higher-quality results and token savings when using the models in a particular domain. That said, at the time of writing this post, fine-tuning is only available for some select base models such as, e.g., davinci
, curie
, babbage
and ada
(refer to OpenAI documentation on fine-tuning for more up-to-date information).
ChatGPT Plugins. OpenAI plugins connect ChatGPT to third-party applications, enhancing the model’s capabilities and allowing it to:
- retrieve real-time information (e.g., cluster information),
- retrieve domain-specific knowledge (e.g., product documentation past the model’s information cutoff date), and,
- assist users with different actions (e.g., managing clusters).
By leveraging the three strategies above—prompt design, fine-tuning, and ChatGPT plugins—you can significantly enhance the performance and versatility of chat completion models. You can learn more about the use of the ChatGPT plugin ecosystem and specifically more about retrieval plugins for Azure Database for PostgreSQL in some new blog posts. For now, let us continue with how you can leverage prompt design and real-time information injection (via the plugin system) in a real-world use case.