Skip to main content

Setting up a high availability (HA) ELK (Elasticsearch, Logstash, Kibana) stack on a Kubernetes cluster with autoscaling involves several steps. Here’s an outline of how you can achieve this:

1. Prepare the Kubernetes Cluster

  • Ensure your Kubernetes cluster has multiple nodes to support high availability. You’ll need nodes with sufficient CPU, memory, and disk space for Elasticsearch and Logstash components.
  • Install necessary tools like kubectl and helm for deploying and managing the cluster.

2. Deploy Elasticsearch

Elasticsearch is the core of the ELK stack and requires careful configuration to be scalable and highly available.

Steps:

  • Storage Class: Set up a storage class with persistent volume claims (PVCs) for Elasticsearch. Ensure that storage is replicated across zones to avoid data loss.
  • Helm Installation: Use Helm to deploy Elasticsearch.
    bash
    helm repo add elastic https://helm.elastic.co
    helm repo update
    helm install elasticsearch elastic/elasticsearch --set replicas=3

    This installs a 3-node Elasticsearch cluster, making it highly available by distributing data across the nodes.

  • Elasticsearch Configuration:
    • Cluster Mode: Set Elasticsearch to run in cluster mode, with each node assigned a specific role (master, data, ingest).
    • Autoscaling: Use Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale Elasticsearch pods based on resource usage.
      bash
      kubectl autoscale deployment elasticsearch --cpu-percent=80 --min=3 --max=10

      This will scale up or down the number of Elasticsearch pods depending on CPU usage.

  • Persistence: Make sure to use persistent volumes for the data nodes so that data is not lost if pods are rescheduled.

3. Deploy Logstash

Logstash is used to ingest, transform, and forward logs to Elasticsearch.

Steps:

  • Helm Installation: Use Helm to deploy Logstash.
    bash
    helm install logstash elastic/logstash
  • Logstash Configuration:
    • Configure Logstash to read inputs from your desired sources (e.g., filebeat, syslog).
    • Autoscaling: Similar to Elasticsearch, use the HPA to scale Logstash based on resource usage.
      bash
      kubectl autoscale deployment logstash --cpu-percent=80 --min=2 --max=5

4. Deploy Kibana

Kibana provides the front end for visualizing data stored in Elasticsearch.

Steps:

  • Helm Installation: Deploy Kibana using Helm.
    bash
    helm install kibana elastic/kibana
  • Kibana Configuration:
    • Configure Kibana to connect to the Elasticsearch cluster.
    • Service Type: You may want to expose Kibana as a LoadBalancer or through an Ingress controller if you need external access.

5. Set Up Autoscaling

For each component of the ELK stack (Elasticsearch, Logstash, Kibana), autoscaling can be set up based on CPU, memory usage, or other metrics.

  • Use Kubernetes’ Horizontal Pod Autoscaler (HPA).
  • Consider using Elasticsearch Autoscaler for automatically adjusting the number of nodes based on data size and query load.
  • Enable Elasticsearch shard rebalancing to ensure even distribution of data across nodes as the cluster scales up or down.

6. Monitor and Tune the Cluster

  • Monitoring Tools: Use Prometheus and Grafana to monitor the performance of your ELK stack, including resource usage and scaling behavior.
  • Adjust Resources: Tune the resource requests/limits for Elasticsearch, Logstash, and Kibana to ensure efficient autoscaling.

7. Handle Failures

  • Multi-Zone Setup: If you’re running in a cloud environment, distribute your Kubernetes nodes across multiple availability zones to prevent outages in case one zone goes down.
  • StatefulSet for Elasticsearch: Elasticsearch should be deployed as a StatefulSet, which provides stable network IDs, persistent storage, and automatic rescheduling of failed nodes.

8. Security and Authentication

  • TLS/SSL: Set up encryption for Elasticsearch and Kibana to secure communication.
  • Authentication: Use Elasticsearch’s native security or an external solution like OpenID Connect or SAML for user authentication.

This architecture ensures high availability and scalability for your ELK stack in a Kubernetes environment.

Let’s dive into the specific parts of setting up high availability and autoscaling for your ELK stack on Kubernetes. We’ll guide you through writing Helm chart configurations for Elasticsearch, Logstash, and Kibana, and set up Horizontal Pod Autoscaler (HPA) policies for autoscaling.

1. Helm Chart for Elasticsearch (HA with Autoscaling)

Elasticsearch Helm Values

Create or update a custom values.yaml file for Elasticsearch with the following configurations:

yaml
replicas: 3
minimumMasterNodes: 2
esConfig:
elasticsearch.yml: |
cluster.name: “elastic-cluster”
network.host: 0.0.0.0
cluster.routing.allocation.awareness.attributes: “rack_id”

nodeSelector:
role: master
role: data

persistence:
enabled: true
storageClass: “your-storage-class”
accessModes:
ReadWriteOnce
size: 50Gi

resources:
requests:
cpu: “1”
memory: “2Gi”
limits:
cpu: “2”
memory: “4Gi”

autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 75
targetMemoryUtilizationPercentage: 80

This configuration ensures:

  • Replicas: 3 Elasticsearch nodes (HA setup).
  • Persistent Storage: Each data node has a 50Gi persistent volume.
  • Node Selector: Roles for master and data nodes are defined.
  • Autoscaling: Horizontal Pod Autoscaler will scale the cluster based on CPU and memory usage.

Install Elasticsearch with Helm

bash
helm repo add elastic https://helm.elastic.co
helm install elasticsearch -f values.yaml elastic/elasticsearch

This will deploy Elasticsearch with high availability, persistence, and autoscaling enabled.

2. Helm Chart for Logstash (HA with Autoscaling)

Logstash Helm Values

Create or update the values.yaml file for Logstash:

yaml

replicaCount: 2

logstashConfig:
logstash.yml: |
pipeline.workers: 4
pipeline.batch.size: 125
queue.type: persisted

resources:
requests:
cpu: “500m”
memory: “1Gi”
limits:
cpu: “1”
memory: “2Gi”

autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 75
targetMemoryUtilizationPercentage: 80

persistence:
enabled: true
storageClass: “your-storage-class”
accessModes:
ReadWriteOnce
size: 20Gi

  • Replicas: 2 Logstash pods (can be scaled as needed).
  • Persistence: Logstash queues are persisted on storage to handle failover.
  • Autoscaling: HPA enabled based on CPU and memory usage.

Install Logstash with Helm

bash
helm install logstash -f values.yaml elastic/logstash

3. Helm Chart for Kibana (HA with Autoscaling)

Kibana generally doesn’t require as much scaling, but here’s a simple autoscaling setup.

Kibana Helm Values

Create or update the values.yaml for Kibana:

yaml

replicaCount: 2

resources:
requests:
cpu: “500m”
memory: “512Mi”
limits:
cpu: “1”
memory: “1Gi”

autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 85

service:
type: LoadBalancer

This configuration provides:

  • Replicas: 2 Kibana pods for HA.
  • Autoscaling: Scales between 2-4 replicas based on resource usage.
  • Load Balancer: Exposes Kibana using a LoadBalancer service for external access.

Install Kibana with Helm

bash
helm install kibana -f values.yaml elastic/kibana

4. Setting Up Horizontal Pod Autoscaler (HPA)

If you need to manually set up HPA, Kubernetes allows you to create HPAs for each component (Elasticsearch, Logstash, Kibana). Here’s how:

HPA for Elasticsearch

bash
kubectl autoscale deployment elasticsearch --cpu-percent=75 --min=3 --max=10

HPA for Logstash

bash
kubectl autoscale deployment logstash --cpu-percent=75 --min=2 --max=5

HPA for Kibana

bash
kubectl autoscale deployment kibana --cpu-percent=80 --min=2 --max=4

5. Monitoring and Tuning

To monitor the health of the ELK stack:

  • Install Prometheus and Grafana to monitor resource usage and autoscaling behavior.
  • Add Kibana metrics to Grafana dashboards for visualizing Elasticsearch query performance and Logstash throughput.

With this setup, your ELK cluster should be scalable, highly available, and ready to handle varying loads automatically.