Building a Production-Ready Observability Stack: OpenTelemetry + Loki + Tempo + Grafana on EKS
** 🎯 Introduction**
Observability is one of the most important parts of a DevOps setup today. When you run workloads in Kubernetes (especially on AWS EKS), you need to see what is happening — logs, metrics, and traces — all in one place.
In this post, I will show how I built a full observability stack using:
OpenTelemetry – to collect data
Prometheus – for metrics
Loki – for logs
Tempo – for distributed traces
Grafana – for visualization
Everything runs on Amazon EKS using Helm and Terraform.
🏗️ Step 1: Prepare the EKS Cluster
If you already have an EKS cluster, skip this step.
Below is a simple Terraform snippet to create one:
provider "aws" {
region = "us-west-2"
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "observability-demo"
cluster_version = "1.29"
vpc_id = "vpc-xxxxxxx"
subnet_ids = ["subnet-xxxxxx", "subnet-yyyyyy"]
node_groups = {
default = {
desired_capacity = 2
max_capacity = 3
instance_type = "t3.medium"
}
}
}
After running terraform apply, connect to the cluster:
aws eks update-kubeconfig --name observability-demo
📦 Step 2: Install Prometheus (Metrics)
We use the official Prometheus Helm chart:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace
This will deploy:
Prometheus server
Node exporter
Kube state metrics
Grafana (optional, we’ll replace later)
Check status:
kubectl get pods -n monitoring
🧝 Step 3: Install Loki (Logs)
Loki works like Prometheus, but for logs. It stores logs efficiently and lets you query them with LogQL.
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
--namespace monitoring
Now, your cluster will start collecting logs from pods automatically via Promtail.
🔍 Step 4: Install Tempo (Traces)
Tempo is Grafana’s open-source tracing backend. It works great with OpenTelemetry.
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
--namespace monitoring
You can verify:
kubectl get pods -n monitoring | grep tempo
🧠 Step 5: Configure OpenTelemetry Collector
We use OpenTelemetry to collect and send traces and metrics to Prometheus and Tempo.
Here’s a simple deployment YAML:
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: monitoring
data:
otel-collector-config.yaml: |
receivers:
otlp:
protocols:
http:
grpc:
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
otlp:
endpoint: "tempo.monitoring.svc.cluster.local:4317"
tls:
insecure: true
service:
pipelines:
metrics:
receivers: [otlp]
exporters: [prometheus]
traces:
receivers: [otlp]
exporters: [otlp]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
volumeMounts:
- name: config
mountPath: /etc/otel/config
args: ["--config=/etc/otel/config/otel-collector-config.yaml"]
volumes:
- name: config
configMap:
name: otel-collector-config
Apply it:
kubectl apply -f otel-collector.yaml
📊 Step 6: Install Grafana
Now let’s install Grafana separately so we can connect all data sources.
kubectl port-forward svc/grafana 3000:80 -n monitoring
Forward port:
kubectl port-forward svc/grafana 3000:80 -n monitoring
Visit: http://localhost:3000
➕ Add Data Sources in Grafana
In Grafana → Settings → Data Sources:
Prometheus:
http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090Loki:
http://loki.monitoring.svc.cluster.local:3100Tempo:
http://tempo.monitoring.svc.cluster.local:3100
Now you can:
View metrics in Prometheus dashboards
Explore logs in Loki
Trace requests in Tempo
🧡 Step 7: Connect Application with OpenTelemetry SDK
In your application code (example: Python Flask):
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
otlp_exporter = OTLPSpanExporter(endpoint="otel-collector.monitoring.svc.cluster.local:4317", insecure=True)
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))
@app.route("/")
def hello():
with tracer.start_as_current_span("hello-span"):
return "Hello from OpenTelemetry!"
When you make requests, traces will appear in Grafana Tempo, and you can correlate them with logs and metrics.
✅ Conclusion
You now have a full observability platform on EKS:
Logs → Loki
Metrics → Prometheus
Traces → Tempo
Visualization → Grafana
Collection → OpenTelemetry
This setup helps both DevOps and developers quickly understand what’s happening inside the cluster. You can expand it later with:
Alertmanager for notifications
Persistent volumes for long-term storage
Authentication for Grafana access
💬 Final Thoughts
As a DevOps engineer, I learned that good observability saves time and stress. When you can see everything, you can fix problems faster.
Even if English is not your first language — let your dashboards speak for you! 😄
