Scaling Applications on Amazon EKS: A Complete Guide

Scaling applications effectively on Amazon EKS requires understanding multiple scaling mechanisms and how they work together. This guide covers horizontal pod scaling, vertical pod scaling, and cluster autoscaling strategies.

Understanding EKS Scaling Components

Horizontal Pod Autoscaler (HPA)

Automatically scales the number of pods based on CPU, memory, or custom metrics.

Vertical Pod Autoscaler (VPA)

Adjusts CPU and memory requests/limits for containers based on usage patterns.

Cluster Autoscaler

Automatically adjusts the number of nodes in your cluster based on pod scheduling requirements.

Setting Up Horizontal Pod Autoscaler

Basic HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

Custom Metrics HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa-custom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  - type: External
    external:
      metric:
        name: sqs_messages_visible
        selector:
          matchLabels:
            queue_name: processing-queue
      target:
        type: AverageValue
        averageValue: "10"

Implementing Vertical Pod Autoscaler

VPA Installation

# Install VPA CRDs and controllers
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-release.yaml

# Verify VPA installation
kubectl get pods -n kube-system | grep vpa

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

VPA Recommendations Only

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: monitoring-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: monitoring-app
  updatePolicy:
    updateMode: "Off"  # Only provide recommendations
  resourcePolicy:
    containerPolicies:
    - containerName: monitoring
      controlledResources: ["cpu", "memory"]

Cluster Autoscaler Setup

IAM Role and Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeTags",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup",
        "ec2:DescribeLaunchTemplateVersions"
      ],
      "Resource": "*"
    }
  ]
}

Cluster Autoscaler Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.26.2
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        env:
        - name: AWS_REGION
          value: us-west-2

Advanced Scaling Strategies

Multi-Metric Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: multi-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-service
  minReplicas: 5
  maxReplicas: 200
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "500"

Predictive Scaling with Scheduled HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: scheduled-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ecommerce-app
  minReplicas: 10  # Higher baseline for anticipated traffic
  maxReplicas: 500
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Lower threshold during high-traffic periods
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30  # Faster scale-up
      policies:
      - type: Percent
        value: 200
        periodSeconds: 30

Monitoring and Observability

Scaling Metrics Dashboard

apiVersion: v1
kind: ConfigMap
metadata:
  name: scaling-dashboard
data:
  dashboard.json: |
    {
      "dashboard": {
        "title": "EKS Scaling Metrics",
        "panels": [
          {
            "title": "HPA Replicas",
            "targets": [
              {
                "expr": "kube_horizontalpodautoscaler_status_current_replicas"
              }
            ]
          },
          {
            "title": "Node Count",
            "targets": [
              {
                "expr": "kube_node_info"
              }
            ]
          },
          {
            "title": "Pod CPU Usage",
            "targets": [
              {
                "expr": "rate(container_cpu_usage_seconds_total[5m])"
              }
            ]
          }
        ]
      }
    }

Scaling Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: scaling-alerts
spec:
  groups:
  - name: scaling
    rules:
    - alert: HPAMaxReplicasReached
      expr: kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "HPA {{ $labels.horizontalpodautoscaler }} has reached maximum replicas"

    - alert: ClusterAutoscalerErrors
      expr: increase(cluster_autoscaler_errors_total[5m]) > 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "Cluster Autoscaler experiencing errors"

Cost Optimization Strategies

Right-sizing with VPA

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: cost-optimized-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: background-worker
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: worker
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 500m
        memory: 1Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

Spot Instance Integration

apiVersion: v1
kind: NodePool
metadata:
  name: spot-worker-pool
spec:
  requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot"]
  - key: node.kubernetes.io/instance-type
    operator: In
    values: ["m5.large", "m5.xlarge", "m4.large"]
  limits:
    resources:
      cpu: 1000
      memory: 1000Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

This comprehensive approach ensures your EKS applications scale efficiently while maintaining cost effectiveness and optimal performance.