3 minute read
In our last post we saw that we can get much
more performance by rewriting Pandas GroupBy – Apply in NumPy.
I also mentioned that I could not get Numba working with this code to see if it helps.
In this blog, I want to talk about how I got Numba working and what the results were,
but first, what’s Numba?
Numba
Numba is a just-in-time (JIT) compiler for Python
that specializes in optimizing the performance of numerical computations.
Well, that’s all well and good, but what is a JIT compiler?
Most people are aware of compiled languages like C++, Rust or Java.
For these languages, the development flow is to write code,
then compile the code to a binary and then run the binary.
But for an interpreted language like Python, the second step is missing.
Python is dynamic so the type of variables can be anything.
This often means that Python functions spend a lot of time
checking variable attributes to then do the correct function call on them.
The final function call usually involves a C function call which is fast,
but the overhead of type checking and edge cases are huge.
Compiled languages are typed and so no type checking is necessary.
Compilers can use this and other knowledge to create