When discussing GenAI, the term “GPU” almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word “GPU” is assumed to mean “Nvidia” products. (As an aside, the popular Nvidia hardware used in GenAI are not technically Graphical Processing Units. I prefer SIMD units.)
The association of GenAI and GPUs with Nvidia is no accident. Nvidia has always recognized the need for tools and applications to help grow its market. They have created a very low barrier to getting software tools (e.g., CUDA) and optimized libraries (e.g., cuDNN) for Nvidia hardware. Indeed, Nvidia is known as a hardware company, but as Bryan Catanzaro, VP of Applied Deep Learning Research, Nvidia has stated ” Many people don’t know this, but Nvidia has more software engineers than hardware engineers.”
As a result, Nvidia has built a powerful software “moat” around their hardware. While CUDA is not open source, it is freely available and under the firm control of Nvidia. While this situation has benefited Nvidia (As it should. They invested time and money into CUDA), it has created difficulties for those companies and users that want to grab some of the HPC and GenAI market with alternate hardware.
Building on the Castle Foundation
The number of foundational models developed for GenAI continues to grow. Many of these are “open source” because they can be used and shared freely. (For example, the Llama foundational model from Meta) In addition, they require a large number of resources (both people and machines) to create and are limited mainly to the hyperscalers (AWS, Microsoft Azure, Google Cloud, Meta Platforms, and Apple) that have huge amounts of GPUs available, In addition to the hyperscalers, other companies have invested in hardware (i.e. purchased a massive amount of GPUs) to create their own foundational models.
From a research perspective, the models are interesting and can be used for a variety of tasks; however, the expected use and need for even more GenAI computing resources is two fold;
- Fine-tuning — Adding domain-specific data to foundational models to make it work for your use case.
- Inference – Once the model is fine-tuned, it will require resources when used (i.e., asked questions).
These tasks are not restricted to hyperscalers and will need accelerated computing, that is, GPUs. The obvious solution is to buy more “unavailable” Nvidia GPUs, and AMD is ready and waiting now that the demand has far outstripped the supply. To be fair, Intel and some other companies are also ready and waiting to sell into this market. The point is that