Nebullvm, an open-source library to accelerate AI inference by 5-20x by emilec___

Share This Article

Sed ut perspiciatis unde.

How Nebullvm Works •
Benchmarks •
Installation •
Get Started

nebullvm speeds up AI inference by 5-20x in just a few lines of code 🚀

How Nebullvm Works

This open-source library takes your AI model as input and outputs an optimized
version that runs 5-20 times faster on your machine. Nebullvm tests multiple
deep learning compilers to identify the best possible way to execute your
model on your specific hardware, without impacting the accuracy of your model.

The goal of nebullvm is to let any developer benefit from deep learning (DL)
compilers without having to spend tons of hours understanding, installing,
testing and debugging this powerful technology.

The library aims to be:

💻 Deep learning model agnostic. nebullvm supports all the most popular
architectures such as transformers, LSTMs, CNNs and FCNs.

🤖 Hardware agnostic. The library now works on most CPUs and GPUs and
will soon support TPUs and other deep learning-specific ASICs.

🔥 Framework agnostic. nebullvm supports the most widely used frameworks
(PyTorch, TensorFlow and Hugging Face) and will soon support many more.

🔑 Secure. Everything runs locally on your machine.

☘️ Easy-to-use. It takes a few lines of code to install the library and
optimize your models.

✨ Leveraging the best deep learning compilers. There are tons of DL compilers
that optimize the way your AI models run on your hardware. It would take tons of
hours for a developer to install and test them at every model deployment.
The library does it for you!

Do you like the concept? Leave a ⭐ if you enjoy the project.
And happy acceleration 🚀🚀

Technology demonstration and benchmarks

Technology demonstration

We suggest testing the library on your AI models right away by following the
installation instructions below.
If you want to get a first feel for the library’s capabilities,
we have built 3 notebooks where the library can be tested on the most popular
AI frameworks Tensorflow, PyTorch and Hugging Face.

The notebooks will run locally on your hardware, so you can get an idea of the
performance you would achieve with nebullvm on your AI models.
Note that it will take several minutes to install the library the first time.

Benchmarks

We have tested nebullvm on popular AI models and hardware from leading vendors.

Hardware: M1 Pro, NVIDIA T4, Intel Xeon, AMD EPYC
AI Models: EfficientNet, Resnet, SqueezeNet, BERT, GPT2

The table below shows the response time in milliseconds (ms) of the non-optimized
model and the optimized model for the various model-hardware couplings as an
average value over 100 experiments. It also displays the speedup provided
by nebullvm, where speedup is defined as the response time of the optimized
model over the response time of the non-optimized model.

	M1 Pro	M1 Pro optimized	M1 pro speedup	Intel Xeon	Intel Xeon optimized	Intel Xeon speedup	AMD EPYC	AMD EPYC optimized	AMD EPYC speedup	Nvidia T4	Nvidia T4 optimized	Nvidia T4 speedup
EfficientNetB0	215.0 ms	24.4 ms	8.8x	53.6 ms	19.2 ms	2.8x	121.3 ms	47.1 ms	2.6x	12.9 ms	0.3 ms	39.2x
EfficientNetB1	278.8 ms	33.6 ms	8.3x	74.8 ms	27.1 ms	2.8x	175.0 ms	70.4 ms	2.5x	18.0 ms	0.3 ms	54.5x
EfficientNetB2	284.9 ms	36.8 ms	7.8x	86.4 ms	30.0 ms	2.9x	199.1 ms	75.1 ms	2.7x	36.9 ms	0.4 ms	105.5x
EfficientNetB3	370.1 ms	50.4 ms	7.4x	101.8 ms	42.8 ms	2.4x	279.8 ms	118.0 ms	2.4x	20.3 ms	0.3 ms	59.6x
EfficientNetB4	558.9 ms	71.0 ms	7.9x	136.6 ms	64.3 ms	2.1x	400.5 ms	159.1 ms	2.5x	24.9 ms	0.3 ms	73.2x
EfficientNetB5	704.3 ms	99.8 ms	7.1x	189.5 ms	88.9 ms	2.1x	570.2 ms	249.5 ms	2.3x	31.2 ms	0.3 ms	91.9x
Resnet18	18.5 ms	15.8 ms	1.2x	57.4 ms	37.9 ms	1.5x	164.3 ms	121.9 ms	1.4x	9.4 ms	0.3 ms	27.6x
SqueezeNet	15.3 ms	7.9 ms	1.9x	39.1 ms	17.3 ms	2.3x	119.0 ms	58.7 ms	2.0x	8.9 ms	0.3 ms	26.1x
GPT2 – 10 tokens	29.7 ms	10.8 ms	2.8x	63.4 ms	44.6 ms	1.4x	180.7 ms	59.1 ms	3.1x	15.3 ms	4.4 ms	3.5x
Bert – 8 tokens	39.4 ms	6.2 ms	6.4x	44.9 ms	39.3 ms	1.1x	148.4 ms	46.5 ms	3.2x	10.4 ms	3.8 ms	2.7x
Bert – 512 tokens	489.5 ms	276.4 ms	1.8x	801.7 ms	782.8 ms	1.0x	5416.7 ms	2710.7 ms	2.0x	31.3 ms	27.4 ms	1.1x
_________________	_____________	_____________	_____________	_____________	_____________	_____________	_____________	_____________	_____________	_____________	_____________	_____________

At first glance, we can observe that speedup varies greatly across hardware-model
couplings. Overall, the library provides great positive results, most
ranging from 2 to 30+ times speedup.

To summarize, the results are:

nebullvm provides positive acceleration to non-optimized AI models
These early results show poorer (yet positive) performance on Hugging Face
models. Support for Hugging Face has just been released and improvements will be
included in future versions
The library provides a ~2-3x boost on Intel and AMD hardware. These results are
most likely related to an already highly optimized implementation of PyTorch for
x86 devices
Nebullvm delivers extremely good performance on NVIDIA machines
The library provides great performances also on Apple M1 chips

And across all scenarios, nebullvm is very useful for its ease of use,
allowing you to take advantage of deep learning compilers without having
to spend hours studying, testing and debugging this technology.

Installation and Get Started

Step 1: Installation of nebullvm library

There are two ways to install nebullvm:

Using PyPI. We suggest installing the library with pip to get the stable
version of nebullvm
From source code to get the latest features

Option A: Installation with PyPI (recommended)

The easiest way to install nebullvm is by using pip, running

Option B: Source code installation

To install the source code you have to clone the directory on your local machine
using git.

git clone https://github.com/nebuly-ai/nebullvm.git

Then, enter the repo and install nebullvm with pip.

cd nebullvm
pip install .

Step 2: Installation of deep learning compilers

Now you need to install the compilers that the library leverages to create the
optimized version of your models. We have built an auto-installer to install them
automatically.
Note that it will take several minutes to install all compilers and we recommend
following the second option below to avoid any installation issues.

Option A: Installation at the first optimization run

The auto-installer is activated after you import nebullvm and perform your
first optimization. You may run into import errors related to the deep learning
compiler installation, but you can ignore these errors/warnings.
It is also recommended restarting the python kernel between the auto-installation
and the first optimization, otherwise not all compilers will be activated.

Option B: Installation before the first optimization run (recommended)

To avoid any problems, we strongly recommend running the auto-installation
before performing the first optimization by running

python -c "import nebullvm"

You should ignore at this stage any import warning resulting from the previous
command.

Option C: Selective installation of DL compilers

The library automatically installs all DL compilers it supports. However, for
some reason you may be interested in bypassing the automatic installation.
If this is the case, you can simply export the environment variable
NO_COMPILER_INSTALLATION=1 by running

export NO_COMPILER_INSTALLATION=1

from your command line or adding

import os
os.environ["NO_COMPILER_INSTALLATION"] = "1"

in your python code before importing nebullvm for the first time.

Note that auto-installation of open-source compilers is done outside the
nebullvm wheel. Installations of ApacheTVM and Openvino have been tested
on macOS, linux distributions similar to Debian and CentOS.

The feature is still in an alpha version, so we expect that it may fail under
untested circumstances.

Possible installation issues

MacOS: the installation may fail on MacOS for MacBooks with the Apple Silicon
chip, due to scipy compilation errors. The easy fix is to install scipy with
another package manager such as conda (the Apple Silicon distribution of
Mini-conda) and then install nebullvm. For any additional issues do not
hesitate to open an issue or contact directly info@nebuly.ai by email.

Get Started

Nebullvm reduces the computation time of deep learning model inference by
5-20 times by testing multiple deep learning compilers and identifying the
best possible way to execute your model on your specific hardware, without
impacting the accuracy of your model.

Currently nebullvm supports models in the pytorch, tensorflow and
huggingface frameworks, and many others will be included soon. Models can be
easily imported from one of the supported frameworks using the appropriate
feature as explain

Nebullvm, an open-source library to accelerate AI inference by 5-20x by emilec___

Nebullvm, an open-source library to accelerate AI inference by 5-20x by emilec___

Share This Article

Newsletter

How Nebullvm Works

The library aims to be:

Technology demonstration and benchmarks

Technology demonstration

Benchmarks

Installation and Get Started

Step 1: Installation of nebullvm library

Option A: Installation with PyPI (recommended)

Option B: Source code installation

Step 2: Installation of deep learning compilers

Option A: Installation at the first optimization run

Option B: Installation before the first optimization run (recommended)

Option C: Selective installation of DL compilers

Possible installation issues

Get Started

HackTech

Leave a comment Cancel reply

Editor's Choice

Nebullvm, an open-source library to accelerate AI inference by 5-20x by emilec___

Nebullvm, an open-source library to accelerate AI inference by 5-20x by emilec___

Share This Article

Newsletter

How Nebullvm Works

The library aims to be:

Technology demonstration and benchmarks

Technology demonstration

Benchmarks

Installation and Get Started

Step 1: Installation of nebullvm library

Option A: Installation with PyPI (recommended)

Option B: Source code installation

Step 2: Installation of deep learning compilers

Option A: Installation at the first optimization run

Option B: Installation before the first optimization run (recommended)

Option C: Selective installation of DL compilers

Possible installation issues

Get Started

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter