This is part one of a three-part article. This article provides the necessary background and rationale of the series.
The next article will be a detailed explanation of the actual steps necessary to implement the solution.
In the final article, we will explore what we have just created and understand what was just created
The Problem With Pulling
Prometheus is a server which wants to reach out and pull data from “scrape targets”. It will generally do this using http requests. One
problem with this design is that these targets are often inaccessible, hidden from Prometheus behind a firewall.
If not hidden, it means some port was exposed on some network, thereby giving Prometheus the ability to pull the data it needs. Exposing
that port on a “trusted” network is a possible attack vector for bad actors. Exposing that port on the open internet (as is often the case)
is an open invitation for attack. It’s much better to keep these servers totally dark to all networks.
OpenZiti solves this problem of reach elegantly and natively while also keeping your service dark to all networks. This gives an
OpenZiti-enabled Prometheus the ability to literally scrape any target, anywhere. As long as the target is participates on an OpenZiti
overlay network, and as long as the proper polices are in place allowing the traffic to flow, Prometheus will be able to reach out and
pull the data it needs from anything, anywhere.
It doesn’t matter if the target is in some private cloud data center, some private data center protected by a corporate firewall, or heck
even running inside my local docker environment! As long as the target participates on that OpenZiti Prometheus can scrape it! That sort of
reach is impossible with classic networks.
Prometheus
Prometheus is an incredibly popular CNCF
project which has graduated the gauntlet of progressions to emerge as a “graduated” CNCF project. If you’re familiar with Prometheus, there
are probably a couple of reasons people mainly choose to deploy it:
metrics collection and visualization and alerting.
Prometheus is also tremendously flexible. It has numerous available plugins and supports integrating with a wide number of systems.
According
to this CNCF survey
, Prometheus leads the pack when it comes to the project people go to for observability. Its popularity is probably because Prometheus is a
CNCF project and is often considered the “default” solution to deploy on another wildly popular CNCF project
called Kubernetes. One interesting aspect of Prometheus is that it generally favors a poll-based approach to
metrics collection instead of a push-based model.
Poll-based?
I don’t know about you, but historically when I’ve thought about a metrics collection agent, I tend to think of an agent that reads a log
file or some library that pushes rows into a giant data lake in the cloud. I don’t generally think about a solution that implements
poll-based metrics. Often, this is because the target of a poll-based collecting agent will probably be behind a firewall.
As you would expect, firewalls make it exceptionally difficult to implement a poll-based solution as firewalls have been known to make a
habit of preventing external actors from accessing random http servers behind it! After all, that is their primary function!
The Prometheus project makes strong arguments explaining the benefits of a poll-based
solution. They also realize that firewalls are important in creating a safe network and understand the challenges firewalls create for such
a solution. To deal with these situations, the project also provides a PushGateway.
This allows solutions to push their data to a location outbound of the firewall. Pushing data out of the firewall allows metrics and
alerting to function without the worry (and maintenance heartache) of an open, inbound firewall hole.
Acceptable Risk
Prometheus is often deployed into Kubernetes clusters, but it can be deployed anywhere. Taking the operational differences out of the
equation, there is little difference between deploying Prometheus in a Kubernetes cluster and deploying it in one’s data center. Once
deployed, the needs will be the same. Prometheus will need to be authorized to reach out and scrape the targets it needs to scrape. All too
often, this is done with relatively open network permissions. Even though we all know