Decoding Observability: Extracting and Processing Prometheus Metrics with Python
Prometheus has become the de-facto standard for monitoring in the cloud-native world. Its powerful time-series database and query language, PromQL, offer deep insights into your infrastructure and applications. But what if you need to pull that rich data out of the Prometheus ecosystem—say, for custom reporting, advanced data analysis in a separate tool, or integration into automated workflow scripts?
This post will guide you through using the Prometheus HTTP API with a simple Python script to query metrics, process the resulting time-series data, and display meaningful results.
The Prometheus HTTP API and Python
Prometheus exposes a comprehensive HTTP API for querying time-series data and meta-information. The key endpoints for data extraction are:
/api/v1/query: For instant queries (latest value at a single point in time)./api/v1/query_range: For range queries (data over a time period).
For Python, while you could use the basic requests library, a dedicated client like prometheus-api-client simplifies the process significantly by handling connections, queries, and even initial data parsing into Python objects or Pandas DataFrames.
Prerequisites
You'll need:
A running Prometheus server (e.g., at
http://localhost:9090).Python installed with the
requestslibrary (orprometheus-api-clientfor an easier path).
For this example, we'll use a plain requests script to clearly show the API interaction.
1. The Python Script: Fetch, Process, and Display
Let's write a script to fetch the average CPU utilization of all monitored instances over the last hour. We'll use the rate() function in PromQL and a range query.
Script: prometheus_processor.py
Python
import requests
import json
from datetime import datetime, timedelta
import time
# --- CONFIGURATION ---
PROMETHEUS_URL = 'http://localhost:9090'
QUERY_ENDPOINT = f'{PROMETHEUS_URL}/api/v1/query_range'
# PromQL: Calculate the average CPU utilization (excluding idle)
# grouped by instance over a 5m window, converted to a percentage.
PROMETHEUS_QUERY = 'avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100'
# Define the time range (e.g., the last hour)
end_time = datetime.now()
start_time = end_time - timedelta(hours=1)
# Step is the query resolution (e.g., 60 seconds)
step = '60s'
# ---------------------
def query_prometheus_range(query, start, end, step):
"""Fetches a range query from Prometheus API."""
params = {
'query': query,
'start': start.isoformat() + "Z",
'end': end.isoformat() + "Z",
'step': step
}
try:
response = requests.get(QUERY_ENDPOINT, params=params, timeout=30)
response.raise_for_status() # Raise exception for bad status codes
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error querying Prometheus: {e}")
return None
def process_and_display_metrics(data):
"""Processes the Prometheus JSON data and displays a summary."""
if not data or data.get('status') != 'success':
print("Failed to retrieve successful data from Prometheus.")
return
results = data['data']['result']
if not results:
print("No time series data found for the query.")
return
print(f"\n--- CPU Utilization Report ({start_time.strftime('%Y-%m-%d %H:%M:%S')} to {end_time.strftime('%Y-%m-%d %H:%M:%S')}) ---")
for series in results:
instance = series['metric']['instance']
# Processing: find the maximum CPU utilization observed over the period
max_utilization = 0
for _, value in series['values']:
try:
current_utilization = float(value)
if current_utilization > max_utilization:
max_utilization = current_utilization
except ValueError:
continue # Skip non-numeric values
# Display the result
print(f"Instance: {instance.ljust(20)} | Max CPU Utilization: {max_utilization:.2f}%")
if __name__ == "__main__":
# Remove microseconds for cleaner API timestamps
start_api = start_time.replace(microsecond=0)
end_api = end_time.replace(microsecond=0)
# 1. Get metrics data from Prometheus API
prometheus_data = query_prometheus_range(
PROMETHEUS_QUERY,
start_api,
end_api,
step
)
# 2. Process and show the result
process_and_display_metrics(prometheus_data)
2. Configuration and Customization Reference
The script’s behavior is primarily defined by the main configuration variables:
| Variable | Description | Default Value |
PROMETHEUS_URL | The base URL of your Prometheus server. | http://localhost:9090 |
PROMETHEUS_QUERY | The PromQL expression to execute. | The PromQL expression to execute.
|
timedelta(hours=1) | Sets the duration of the query range (1 hour in the default code). | 1 hour |
step | The resolution of the data points returned. | '60s' |
Understanding the PromQL Query
The default PROMETHEUS_QUERYcalculates the utilization:avg by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100
-
rate(...[5m]): Calculates the per-second rate of increase over the last 5 minutes, crucial for time-based metrics. {mode!="idle"}: Excludes idle CPU time.
avg by (instance): Averages the result and groups it by the instance label.
3. Usage and Example Output
To run the script:python prometheus_processor.py
The script will produce an output similar to this:--- CPU Utilization Report (2025-09-26 14:00:00 to 2025-09-26 15:00:00) ---Instance: 192.168.1.10:9100 | Max CPU Utilization: 45.71%Instance: 192.168.1.11:9100 | Max CPU Utilization: 82.15%Instance: prometheus:9090 | Max CPU Utilization: 15.33%
Conclusion
This simple Python script shows the power of integrating your DevOps tooling. By combining the requests library with the Prometheus HTTP API and a well-formed PromQL query, you can pull rich, time-series data for custom analysis and reporting outside of the standard Prometheus/Grafana stack. This method is a crucial step for automating DevOps reporting, integrating performance data into CI/CD pipelines, or feeding custom machine learning models.
