Observing Our Observability Tool by avthar

Share This Article

Sed ut perspiciatis unde.

It’s KubeCon Europe week, and at Timescale, we’re celebrating it the way we know best: with a new edition of #AlwaysBeLaunching! 🐯 🎊 During this week, we are releasing new features and content every day, all focused on Promscale and observability. Don’t miss any of it by following our hashtag #PromscaleKubecon on Twitter—and if you’re at KubeCon, visit us at booth G3! We have awesome demos (and swag) to give you!

In this era of cloud-native systems, the focus on observability is a given: observability tools ensure your systems perform correctly and deliver a satisfactory experience to your end-users. In the absence of good observability, your end-users will be the first to notify you about a problem—a tiny dart piercing through every respectable developer’s heart.

Needless to say, your users shouldn’t do your monitoring and alerting for you. You should aim to detect and fix problems before your users notice them, not after. When an issue arises, you need to find the root of the problem. And in a distributed architecture, the best way to do so is by interrogating your telemetry data with random questions, getting results in real time. You need real observability.

Observing the Observer

We all know that observability tools are critical for any system that aims to deliver an always available and reliable service. But what happens if your observability tool stops working? If your system has an issue, you’d only notice it when a user of said system notifies you—meaning that we’d be back to square one, running blind again.

As an observability practitioner, you’ll find many observability tools and tons of resources on configuring them to collect, store, query, visualize, and alert on telemetry data from the systems you monitor.

But who observes the observer?

We must treat observability tools as highly available and reliable systems. They, too, have to be monitored to ensure correct behavior. But it is surprisingly hard to find information on how to observe your own observability tool effectively. For example, we’ve struggled to find information online about how to monitor Prometheus —if there is any, we have not been able to find it.

This seems like a missing piece (more like a missing pillar) in our observability journey.

In the Promscale team, we decided to prioritize this. As a result, we’ve built a set of alerting rules, runbooks, and dashboards that will help Promscale users track the performance of their own Promscale instance while providing them with guidance on how to fix common issues. In this blog post, we tell you how we did it.

We relied on open-source components to build this set of tools, combined with our own experience assisting Promscale users. So even if you’re not a Promscale user, we hope this blog post can give you ideas on how to build your own “observing the observer” setup.

Observing Promscale

To understand our reasoning behind the alerts, runbooks, and dashboards we ended up creating, we’ll have to first explain in more detail how Promscale works.

Promscale is a unified observability backend for metrics and traces built on PostgreSQL and TimescaleDB. It has a simple architecture with only two components: the Promscale Connector and the Promscale Database.

The Promscale Connector is a stateless service that adds a “plug and play” functionality to connect some of the most common observability tools and protocols to the Promscale Database. These are some of its functions:

Ingestion of Prometheus metrics and OpenTelemetry traces
PromQL query support
APIs for seamless integration with the Jaeger and Grafana user interfaces to visualize distributed tracesPromQL alerting and recording rules

The second component is the Promscale Database. The Promscale Database is PostgreSQL with observability superpowers, including:

An optimized schema for observability data, including metrics and traces
All the time-series analytical functions and the performance improvements provided by TimescaleDB
Database management functions, aggregates to speed up PromQL queries, and SQL query experience enhancements for observability data

Tools that speak SQL can connect directly to the Promscale Database, while other common open-source tools such as Prometheus, Jaeger, and Grafana can integrate with the Prometheus Connector.

This simple architecture benefits easier troubleshooting. Promscale’s PostgreSQL foundation also helps—we’re talking about a very mature piece of software with a lot of documentation and knowledge around its configuration, tuning, and troubleshooting.

Still, we knew that we could accelerate the production-readiness process by providing extra guidance to our users through an extensive set of alerts and runbooks created by the engineering team building the product.

Common Performance Bottlenecks

From our conversations with users, we learned that when tracking Promscale’s performance, there are three processes that they should be paying particular attention to:

Data ingest

In Promscale, metrics and traces follow different ingest paths. Let’s cover them separately.

Metrics

When metrics are ingested, they are transformed into the Promscale metric schema. This schema stores the series’ metadata and data points in separate tables. Each data point includes the associated series’ identifier. Metric labels (both keys and values) are also stored in a different table: only the IDs that reference the values are stored in the series table.

To avoid running queries to retrieve those IDs from the database when new data points are inserted for existing series, the Pro

Observing Our Observability Tool by avthar

Observing Our Observability Tool by avthar

Share This Article

Newsletter

Observing the Observer

Observing Promscale

Common Performance Bottlenecks

Data ingest

HackTech

Leave a comment Cancel reply

Editor's Choice

Observing Our Observability Tool by avthar

Observing Our Observability Tool by avthar

Share This Article

Newsletter

Observing the Observer

Observing Promscale

Common Performance Bottlenecks

Data ingest

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter