Tuesday, September 27, 2022

Prometheus - Monitoring system & time series database

What is Prometheus?

  • In Greek mythology, Prometheus, possibly meaning “forethought”, is a Titan god of fire. Prometheus is known for his intelligence and for being a champion of humankind and is also seen as the author of the human arts and sciences generally.

  • Prometheus is an independent open-source tool for monitoring and maintaining your application
  • Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

  • Prometheus consists of several elements that help provide excellent services like Prometheus servers, push gateway, alert manager, client libraries, etc. This article will discuss the Prometheus architecture in detail and see what each element does.

Here are some of the main features of Prometheus:

  • Multidimensional Data Model

    • Using time-series data with metric names and key-value pairs as identifiers.

  • PromQL

    • A querying language with a multi-dimensional data model can be used.

  • Pull Model

    • By actively "grabbing" data through HTTP, Prometheus may collect time-series data.

  • No Reliance on Distributed Storage

    • All single server nodes are self-contained.

  • Pushing Time-series Data

    • Through the usage of an intermediary gateway, this service is available.

  • Monitoring Target Discovery

    • Static configuration or service discovery are both options.

  • Visualization

    • Prometheus has a variety of graphs and dashboards to choose from.**
What are metrics?

  • In layperson terms, metrics are numeric measurements. Time series means that changes are recorded over time.
  • Metrics play an important role in understanding why your application is working in a certain way. Let's assume you are running a web application and find that the application is slow. You will need some information to find out what is happening with your application.
  • For example, the application can become slow when the number of requests is high. If you have the request count metric you can spot the reason and increase the number of servers to handle the load

Use Cases

  • Prometheus is a perfect monitoring tool for performance monitoring, cloud monitoring, IoT monitoring, etc. Its ease of use and versatility make it suitable for many types of applications, be it a python web application or a Kubernetes application. It can handle almost any kind of metrics your application may push and fetch.

Components in Prometheus Architecture

  • The Prometheus ecosystem consists of multiple components, many of which are optional

  • 1. Prometheus server: 
    • the main Prometheus server which scrapes and stores time series data
  • 2. Client libraries: 
    • client libraries for instrumenting application code
  • 3. Push gateway: 
    • a push gateway for supporting short-lived jobs
  • 4. Exporters: 
    • special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
  • 5. Alert Manager: 
    • an alert manager to handle alerts

  • 6. Service Discovery
  • 7. Dashboards

1. Prometheus Server

  • The scraping and storing of metrics are handled by the Prometheus server. The server is in charge of monitoring job scheduling, which entails querying data sources (known as "instances") at a predetermined polling frequency. Monitoring jobs are set up using one or more "scrape config" directives, which are handled via a YAML configuration file that may be live-reloaded using a SIGHUP or the Management API.
  • To find scrape targets, Prometheus relies extensively on several service discovery (SD) techniques. These integrations range from generic APIs to file-based service discovery, which bespoke SD implementations can use to manage a JSON or YAML file containing a list of targets.

2. Client Libraries

  • Metrics do not typically magically spring forth from applications; someone has to add the instrumentation that produces them. This is where client libraries come in. With usually only two or three lines of code, you can both define a metric and add your desired instrumentation inline in the code you control. This is referred to as direct instrumentation.

  • Client libraries let us define and expose internal metrics via an HTTP endpoint on application’s instance.

Official client libraries:
  • Go
  • Java or Scala
  • Python
  • Ruby
Unofficial client libraries:
  • Bash
  • C
  • C++
  • Common Lisp
  • Dart
  • Elixir
  • Erlang
  • Haskell
  • Lua for Nginx
  • And much more

  • Choose a Prometheus client library that is compatible with your application's language. You'll have problems defining and exposing internal metrics via HTTPS endpoints on your application if the library doesn't match the application language.

3. Push gateway:

  • Prometheus is a pull-based system. It decides when and what to scrape, based on its configuration. There are also push-based systems, where the monitoring target decides if it is going to be monitored and how often.
  • Although Prometheus is largely a pull-based monitoring system, it includes a "Pushgateway" component that allows metrics from other applications and services to be pushed in.
  • The Pushgateway is useful for gathering metrics from systems that aren't compatible with the rest of the infrastructure, which is pull-based.
  • For example, ephemeral batch tasks that start and stop before Prometheus can identify and scrape metrics from them may start and stop before Prometheus can discover and scrape metrics from them. The Prometheus Pushgateway can be used to push the metrics of such processes, preventing important data from being pulled before it is lost.

4. Exporters

  • Exporters are third-party tools that help scrape metrics when it is not feasible to extract metrics directly. Some exporters are official, while others are not officially declared in the Prometheus Github organization.
  • An exporter is a piece of software that you deploy right beside the application you want to obtain metrics from. It takes in requests from Prometheus, gathers the required data from the application, transforms them into the correct format, and finally returns them in a response to Prometheus
  • We can think of an exporter as a small one-to-one proxy, converting data between the metrics interface of an application and the Prometheus exposition format.

5. Alert Manager

  • The Alertmanager receives alerts from Prometheus servers and turns them into notifications. Notifications can include email, chat applications such as Slack, and services such as PagerDuty.
  • Related alerts can be aggregated into one notification
  • The Alertmanager receives alerts from Prometheus servers and turns them into notifications. Notifications can include email, chat applications such as Slack, and services such as PagerDuty.

6. Service Discovery

  • Once you have all your applications instrumented and your exporters running, Prometheus needs to know where they are. This is so Prometheus will know what is meant to monitor, and be able to notice if something it is meant to be monitoring is not responding. With dynamic environments you cannot simply provide a list of applications and exporters once, as it will get out of date. This is where service discovery comes in.
  • Prometheus has integrations with many common service discovery mechanisms, such as Kubernetes, EC2, and Consul. There is also a generic integration for those whose setup is a little off the beaten path (see “File”)
7. Dashboards

  • Prometheus has a number of HTTP APIs that allow you to both request raw data and evaluate PromQL queries. These can be used to produce graphs and dashboards. Out of the box, Prometheus provides the expression browser. It uses these APIs and is suitable for ad hoc querying and data exploration, but it is not a general dashboard system.
  • It is recommended that you use Grafana for dashboards. It has a wide variety of features, including official support for Prometheus as a data source. It can produce a wide variety of dashboards, such as the one in Figure 1-2. Grafana supports talking to multiple Prometheus servers, even within a single dashboard panel.

What is  PromQL?

  • Prometheus provides its own query language PromQL (Prometheus Query Language) that lets users select and aggregate data. PromQL is specifically adjusted to work in convention with a Time-Series Database and therefore provides time-related query functionalities. 

  • Examples include the rate() function, the instant vector and the range vector which can provide many samples for each queried time series. Prometheus has four clearly defined metric types around which the PromQL components revolve. The four types are
    • Gauge
    • Counter
    • Histogram
    • Summary

How Prometheus Worked?

  • Prometheus collects data in the form of time series. The time series are built through a pull model:
  • The Prometheus server queries(scrape) a list of data sources (sometimes called exporters) at a specific polling frequency
  • Prometheus data is stored in the form of metrics, with each metric having a name that is used for referencing and querying it
  • Prometheus stores data locally on disk, which helps for fast data storage and fast querying but ability to store metrics in remote storage.
  • Each Prometheus server is standalone, not depending on network storage or other remote services.


1) How does Prometheus collect data?
  • Prometheus collects metrics from targets by scraping metrics HTTP endpoints. Since Prometheus exposes data in the same manner about itself, it can also scrape and monitor its own health.

2) What can Prometheus do?
  • Using Prometheus, you can monitor application metrics like throughput (TPS) and response times of the Kafka load generator (Kafka producer), Kafka consumer, and Cassandra client. Node exporter can be used for monitoring of host hardware and kernel metrics.

3) Who uses Prometheus?
  • 580 companies reportedly use Prometheus in their tech stacks, including Uber, Slack, and Robinhood. Digital Ocean, Ericsson, CoreOS, Weaveworks, Amdocs,Red Hat, and Google.

4)  What is the diff between Grafana v/s Prometheus?
  • Grafana is only a visualization solution. Time series storage is not part of its core functionality. … The way Prometheus stores time series is the best by far (thanks to its dimensional model, which uses key-value tagging along the time series to better organize the data and offer strong query capabilities)

5) Where is Prometheus data stored?
  • Prometheus stores its on-disk time series data under the directory specified by the flag storage. local. path . The default path is ./data (relative to the working directory), which is good to try something out quickly but most likely not what you want for actual operations.

You may also like

Kubernetes Microservices
Python AI/ML
Spring Framework Spring Boot
Core Java Java Coding Question
Maven AWS