Please consider subscribing to LWN
Subscriptions are the lifeblood of LWN.net. If you appreciate this |
May 2, 2023
This article was contributed by Koen Vervloesem
Linters are tools that analyze a program’s source code to detect various
problems such as syntax errors, programming mistakes, style violations, and
more. They are important for maintaining code quality and
readability in a project, as well as for catching bugs early in the
development cycle. Last year, a new Python linter appeared: Ruff. It’s fast, written in Rust, and in less than a year it has
been adopted by some high-profile projects, including FastAPI, Pandas, and SciPy.
Linting tools are often part of an integrated development
environment, used in
pre-commit
hooks, or as part of continuous-integration (CI) pipelines. Some popular
linters for
Python include Pylint, Flake8, Pyflakes, and pycodestyle (formerly called
pep8), which are all written in Python as well. Each linter checks whether
the code violates a list of rules. Ruff
reimplements a lot of the rules that are defined by these other popular Python
linters, and combines them into one tool.
Orders of magnitude faster
In August 2022, Charlie Marsh announced
Ruff, which he called “an extremely fast Python linter, written in
Rust”. He showed how Ruff is 150 times faster than Flake8 on macOS when
linting the Python files in the CPython code base, 75 times faster
than pycodestyle, and 50
times faster than Pyflakes and Pylint. While the exact speed gains aren’t
that important (and Ruff has become even faster since then), it’s clear that
it’s orders of magnitudes faster than its competitors, as Marsh
explained:
Even a conservative 25x is the difference between ~real-time
feedback (~300-500ms) and sitting around for 12+ seconds. With
a 150x speed-up, it’s ~300-500ms vs. 75 seconds. If you edit
a single file in CPython and re-run
ruff, it’s 60ms total, increasing the speed-up by another order of
magnitude.
In his example, Marsh touches on Ruff’s
caching. When re-running Ruff on a code base, it only lints the files that
have been changed since the previous run. In contrast, Flake8 re-lints all
of the
files every time, except when running it in a wrapper such as flake8-cached.
The gist of his message is: when linting a code base happens almost
instantaneously, there’s really no reason to not do it. This means that more
developers will add the linter to their pre-commit or CI configuration. So
speed is not just a nice-to-have, but is an essential element of improving code
quality.
One reason why Ruff is faster than the alternatives is that it’s
compiled into machine code instead of running in an interpreter. However, a
second reason for its speed is that it runs all of its checks in a single pass
over the code. Marsh contrasts this with how Flake8 works:
Flake8 is really a wrapper around other tools, like pyflakes and
pycodestyle. When you run Flake8, both pyflakes and pycodestyle are reading
every file from disk, tokenizing the code, and traversing the tree (I might
be wrong on some of the details, but you get the idea). If you then use
autoflake to automatically fix some of your lint violations, you’re running
pycodestyle yet aga