Decoding Observability: Extracting and Processing Prometheus Metrics with Python

Prometheus has become the de-facto standard for monitoring in the cloud-native world. Its powerful time-series database and query language, PromQL, offer deep insights into your infrastructure and applications. But what if you need to pull that rich data out of the Prometheus ecosystem—say, for custom reporting, advanced data analysis in a separate tool, or integration into automated workflow scripts?

This post will guide you through using the Prometheus HTTP API with a simple Python script to query metrics, process the resulting time-series data, and display meaningful results.

 

The Prometheus HTTP API and Python

 

Prometheus exposes a comprehensive HTTP API for querying time-series data and meta-information. The key endpoints for data extraction are:

  • /api/v1/query: For instant queries (latest value at a single point in time).

  • /api/v1/query_range: For range queries (data over a time period).

For Python, while you could use the basic requests library, a dedicated client like prometheus-api-client simplifies the process significantly by handling connections, queries, and even initial data parsing into Python objects or Pandas DataFrames.

 

Prerequisites

 

You'll need:

  1. A running Prometheus server (e.g., at http://localhost:9090).

  2. Python installed with the requests library (or prometheus-api-client for an easier path).

For this example, we'll use a plain requests script to clearly show the API interaction.

 

1. The Python Script: Fetch, Process, and Display

Let's write a script to fetch the average CPU utilization of all monitored instances over the last hour. We'll use the rate() function in PromQL and a range query.

Script: prometheus_processor.py

Python

import requests
import json
from datetime import datetime, timedelta
import time

# --- CONFIGURATION ---
PROMETHEUS_URL = 'http://localhost:9090'
QUERY_ENDPOINT = f'{PROMETHEUS_URL}/api/v1/query_range'

# PromQL: Calculate the average CPU utilization (excluding idle) 
# grouped by instance over a 5m window, converted to a percentage.
PROMETHEUS_QUERY = 'avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100'

# Define the time range (e.g., the last hour)
end_time = datetime.now()
start_time = end_time - timedelta(hours=1)
# Step is the query resolution (e.g., 60 seconds)
step = '60s' 
# ---------------------

def query_prometheus_range(query, start, end, step):
    """Fetches a range query from Prometheus API."""
    params = {
        'query': query,
        'start': start.isoformat() + "Z",
        'end': end.isoformat() + "Z",
        'step': step
    }
    
    try:
        response = requests.get(QUERY_ENDPOINT, params=params, timeout=30)
        response.raise_for_status() # Raise exception for bad status codes
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error querying Prometheus: {e}")
        return None

def process_and_display_metrics(data):
    """Processes the Prometheus JSON data and displays a summary."""
    if not data or data.get('status') != 'success':
        print("Failed to retrieve successful data from Prometheus.")
        return

    results = data['data']['result']
    if not results:
        print("No time series data found for the query.")
        return

    print(f"\n--- CPU Utilization Report ({start_time.strftime('%Y-%m-%d %H:%M:%S')} to {end_time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

    for series in results:
        instance = series['metric']['instance']
        
        # Processing: find the maximum CPU utilization observed over the period
        max_utilization = 0
        for _, value in series['values']:
            try:
                current_utilization = float(value) 
                if current_utilization > max_utilization:
                    max_utilization = current_utilization
            except ValueError:
                continue # Skip non-numeric values

        # Display the result
        print(f"Instance: {instance.ljust(20)} | Max CPU Utilization: {max_utilization:.2f}%")

if __name__ == "__main__":
    # Remove microseconds for cleaner API timestamps
    start_api = start_time.replace(microsecond=0)
    end_api = end_time.replace(microsecond=0)
    
    # 1. Get metrics data from Prometheus API
    prometheus_data = query_prometheus_range(
        PROMETHEUS_QUERY, 
        start_api, 
        end_api, 
        step
    )
    
    # 2. Process and show the result
    process_and_display_metrics(prometheus_data)
2. Configuration and Customization Reference


The script’s behavior is primarily defined by the main configuration variables:

VariableDescriptionDefault Value
PROMETHEUS_URLThe base URL of your Prometheus server.http://localhost:9090
PROMETHEUS_QUERYThe PromQL expression to execute.

The PromQL expression to execute.

avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100

timedelta(hours=1)Sets the duration of the query range (1 hour in the default code).1 hour
stepThe resolution of the data points returned.'60s'

 

Understanding the PromQL Query


The default PROMETHEUS_QUERYcalculates the utilization:
avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100

  •  rate(...[5m]): Calculates the per-second rate of increase over the last 5 minutes, crucial for time-based metrics.
  • {mode!="idle"}: Excludes idle CPU time.
  • avg by (instance): Averages the result and groups it by the instance label.
3. Usage and Example Output


To run the script:


python prometheus_processor.py


The script will produce an output similar to this:


--- CPU Utilization Report (2025-09-26 14:00:00 to 2025-09-26 15:00:00) ---
Instance: 192.168.1.10:9100     | Max CPU Utilization: 45.71%
Instance: 192.168.1.11:9100     | Max CPU Utilization: 82.15%
Instance: prometheus:9090       | Max CPU Utilization: 15.33%


Conclusion


This simple Python script shows the power of integrating your DevOps tooling. By combining the requests library with the Prometheus HTTP API and a well-formed PromQL query, you can pull rich, time-series data for custom analysis and reporting outside of the standard Prometheus/Grafana stack. This method is a crucial step for automating DevOps reporting, integrating performance data into CI/CD pipelines, or feeding custom machine learning models.