Image resizing is a very common geometrical transformation of an image, and it simply consists of a scaling operation.
Since it is one of the most common image processing operations, you can find its implementation in all image processing libraries. Because it is so common, you can expect that the behavior is well defined and will be the same among the libraries.
Unfortunately, this is not true because some little implementation details differ from library to library. If you are not aware of it, this could create a lot of trouble for your applications.
A tricky scenario that could happen, and we as the ML team experienced it, could come from the pre-processing step of a machine learning model.
Usually, we resize the input of a machine learning model mainly because models train faster on smaller images. An input image that is twice the size requires our network to learn from four times as many pixels, with more memory need and times that add up.
Moreover, many deep learning model architectures require that the input have the same size, and raw collected images might have different sizes.
The workflow of the development of an ML model starts from a training phase, typically in Python. Then, if your metrics on the test set satisfy your requirements, you may want to deploy your algorithm.
Suppose you need to use your model in a production environment written in C++, e.g., you need to integrate your model in an existing C++ application. In that case, you want to use your solution in another programming language 1, and you need a way to export “something” that could be used in the production environment.
A good idea to preserve the algorithm behavior is to export the whole pipeline, thus not only the forward pass of the network, given by the weights and the architecture of the layers but also the pre-and post-processing steps.
Fortunately, the main deep learning frameworks, i.e., Tensorflow and PyTorch, give you the possibility to export the whole execution graph into a “program,” called SavedModel
or TorchScript
, respectively. We used the term program because these formats include both the architecture, trained parameters, and computation.
If you are developing a new model from scratch, you can design your application to export the entire pipeline, but this is not always possible if you are using a third-party library. So, for example, you can export only the inference but not the pre-processing.
Here come the resizing problems because you probably need to use a different library to resize your input, maybe because you don’t know how the rescaling is done, or there isn’t the implementation of the Python library in your deploying language.
But why the behavior of resizing is different?
The definition of scaling function is mathematical and should never be a function of the library being used. Unfortunately, implementations differ across commonly-used libraries and mainly come from how it is done the interpolation.
Image transformations are typically done in reverse order (from destination to source) to avoid sampling artifacts.
In practice, for each pixel of the destination image, you need to compute the coordinates of the corresponding pixel in the input image and copy the pixel value:
where is the inverse mapping.
This allows avoiding to have output pixels not assigned to a value.
Usually, when you compute source coordinates, you get floating-point numbers, so you need to decide how to choose which source pixel to copy into the destination.
The naive approach is to round the coordinates to the nearest integers (nearest-neighbor interpolation). However, better results can be achieved by using more sophisticated interpolation methods, where a polynomial function is fit into some neighborhood of the computed pixel , and then the value of the polynomial at is taken as the interpolated pixel value 2.
The problem is that different library could have some little differences in how they implement the interpolation filters but above all, if they introduce the anti-aliasing filter. In fact, if we interpret the image scaling as a form of image resampling from the view of the Nyquist sampling theorem, downsampling to a smaller image from a higher-resolution original can only be carried out after applying a suitable 2D anti-aliasing filter to prevent aliasing artifacts.
OpenCV
that could be considered the standard de-facto in image processing does not use an anti-aliasing filter. On the contrary, Pillow
, probably the most known and used image processing library in Python, introduces the anti-aliasing filter.
Comparison of libraries
To have an idea of how the different implementations affect the resized output image, we compared four libraries, the ones we considered the most used, in particular in the ML field. Moreover, we focused on libraries that could be used in Python.
We tested the following libraries and methods:
-
OpenCV v4.5.3:
cv2.resize
-
Pillow Image Library (PIL) v8.2.0:
Image.resize
-
TensorFlow v2.5.0:
tf.image.resize
This method has a flag
anti-alias
to enable the anti-aliasing filter (default is false). We tested the method in both cases, either flag disabled and enabled. -
PyTorch v1.9.0:
torch.nn.functional.interpolate
PyTorch has another method for resizing, that is
torchvision.transforms.Resize
. We decided not to use it for the tests because it is a wrapper around the PIL library, so that the results will be the same compared toPillow
.
Each method supports a set of