Building a Production-Ready Observability Stack: OpenTelemetry + Loki + Tempo + Grafana on EKS

** 🎯 Introduction**

Observability is one of the most important parts of a DevOps setup today. When you run workloads in Kubernetes (especially on AWS EKS), you need to see what is happening — logs, metrics, and traces — all in one place.

In this post, I will show how I built a full observability stack using:

  • OpenTelemetry – to collect data

  • Prometheus – for metrics

  • Loki – for logs

  • Tempo – for distributed traces

  • Grafana – for visualization

Everything runs on Amazon EKS using Helm and Terraform.


🏗️ Step 1: Prepare the EKS Cluster

If you already have an EKS cluster, skip this step.
Below is a simple Terraform snippet to create one:

provider "aws" {
  region = "us-west-2"
}

module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  cluster_name    = "observability-demo"
  cluster_version = "1.29"

  vpc_id     = "vpc-xxxxxxx"
  subnet_ids = ["subnet-xxxxxx", "subnet-yyyyyy"]

  node_groups = {
    default = {
      desired_capacity = 2
      max_capacity     = 3
      instance_type    = "t3.medium"
    }
  }
}

After running terraform apply, connect to the cluster:

aws eks update-kubeconfig --name observability-demo

📦 Step 2: Install Prometheus (Metrics)

We use the official Prometheus Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace

This will deploy:

  • Prometheus server

  • Node exporter

  • Kube state metrics

  • Grafana (optional, we’ll replace later)

Check status:

kubectl get pods -n monitoring

🧝 Step 3: Install Loki (Logs)

Loki works like Prometheus, but for logs. It stores logs efficiently and lets you query them with LogQL.

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring

Now, your cluster will start collecting logs from pods automatically via Promtail.


🔍 Step 4: Install Tempo (Traces)

Tempo is Grafana’s open-source tracing backend. It works great with OpenTelemetry.

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring

You can verify:

kubectl get pods -n monitoring | grep tempo

🧠 Step 5: Configure OpenTelemetry Collector

We use OpenTelemetry to collect and send traces and metrics to Prometheus and Tempo.

Here’s a simple deployment YAML:

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: monitoring
data:
  otel-collector-config.yaml: |
    receivers:
      otlp:
        protocols:
          http:
          grpc:

    exporters:
      prometheus:
        endpoint: "0.0.0.0:8889"
      otlp:
        endpoint: "tempo.monitoring.svc.cluster.local:4317"
        tls:
          insecure: true

    service:
      pipelines:
        metrics:
          receivers: [otlp]
          exporters: [prometheus]
        traces:
          receivers: [otlp]
          exporters: [otlp]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          volumeMounts:
            - name: config
              mountPath: /etc/otel/config
          args: ["--config=/etc/otel/config/otel-collector-config.yaml"]
      volumes:
        - name: config
          configMap:
            name: otel-collector-config

Apply it:

kubectl apply -f otel-collector.yaml

📊 Step 6: Install Grafana

Now let’s install Grafana separately so we can connect all data sources.

kubectl port-forward svc/grafana 3000:80 -n monitoring

Forward port:

kubectl port-forward svc/grafana 3000:80 -n monitoring

Visit: http://localhost:3000


➕ Add Data Sources in Grafana

In Grafana → Settings → Data Sources:

  • Prometheus: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090

  • Loki: http://loki.monitoring.svc.cluster.local:3100

  • Tempo: http://tempo.monitoring.svc.cluster.local:3100

Now you can:

  • View metrics in Prometheus dashboards

  • Explore logs in Loki

  • Trace requests in Tempo


🧡 Step 7: Connect Application with OpenTelemetry SDK

In your application code (example: Python Flask):

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)

otlp_exporter = OTLPSpanExporter(endpoint="otel-collector.monitoring.svc.cluster.local:4317", insecure=True)
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))

@app.route("/")
def hello():
    with tracer.start_as_current_span("hello-span"):
        return "Hello from OpenTelemetry!"

When you make requests, traces will appear in Grafana Tempo, and you can correlate them with logs and metrics.


✅ Conclusion

You now have a full observability platform on EKS:

  • Logs → Loki

  • Metrics → Prometheus

  • Traces → Tempo

  • Visualization → Grafana

  • Collection → OpenTelemetry

This setup helps both DevOps and developers quickly understand what’s happening inside the cluster. You can expand it later with:

  • Alertmanager for notifications

  • Persistent volumes for long-term storage

  • Authentication for Grafana access


💬 Final Thoughts

As a DevOps engineer, I learned that good observability saves time and stress. When you can see everything, you can fix problems faster.

Even if English is not your first language — let your dashboards speak for you! 😄