Scaling applications effectively on Amazon EKS requires understanding multiple scaling mechanisms and how they work together. This guide covers horizontal pod scaling, vertical pod scaling, and cluster autoscaling strategies.
Understanding EKS Scaling Components
Horizontal Pod Autoscaler (HPA)
Automatically scales the number of pods based on CPU, memory, or custom metrics.
Vertical Pod Autoscaler (VPA)
Adjusts CPU and memory requests/limits for containers based on usage patterns.
Cluster Autoscaler
Automatically adjusts the number of nodes in your cluster based on pod scheduling requirements.
Setting Up Horizontal Pod Autoscaler
Basic HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Custom Metrics HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa-custom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 100
metrics:
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: "100"
- type: External
external:
metric:
name: sqs_messages_visible
selector:
matchLabels:
queue_name: processing-queue
target:
type: AverageValue
averageValue: "10"
Implementing Vertical Pod Autoscaler
VPA Installation
# Install VPA CRDs and controllers
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-release.yaml
# Verify VPA installation
kubectl get pods -n kube-system | grep vpa
VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: web-app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
VPA Recommendations Only
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: monitoring-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: monitoring-app
updatePolicy:
updateMode: "Off" # Only provide recommendations
resourcePolicy:
containerPolicies:
- containerName: monitoring
controlledResources: ["cpu", "memory"]
Cluster Autoscaler Setup
IAM Role and Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": "*"
}
]
}
Cluster Autoscaler Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.26.2
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
env:
- name: AWS_REGION
value: us-west-2
Advanced Scaling Strategies
Multi-Metric Scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: multi-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-service
minReplicas: 5
maxReplicas: 200
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "500"
Predictive Scaling with Scheduled HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: scheduled-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ecommerce-app
minReplicas: 10 # Higher baseline for anticipated traffic
maxReplicas: 500
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Lower threshold during high-traffic periods
behavior:
scaleUp:
stabilizationWindowSeconds: 30 # Faster scale-up
policies:
- type: Percent
value: 200
periodSeconds: 30
Monitoring and Observability
Scaling Metrics Dashboard
apiVersion: v1
kind: ConfigMap
metadata:
name: scaling-dashboard
data:
dashboard.json: |
{
"dashboard": {
"title": "EKS Scaling Metrics",
"panels": [
{
"title": "HPA Replicas",
"targets": [
{
"expr": "kube_horizontalpodautoscaler_status_current_replicas"
}
]
},
{
"title": "Node Count",
"targets": [
{
"expr": "kube_node_info"
}
]
},
{
"title": "Pod CPU Usage",
"targets": [
{
"expr": "rate(container_cpu_usage_seconds_total[5m])"
}
]
}
]
}
}
Scaling Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: scaling-alerts
spec:
groups:
- name: scaling
rules:
- alert: HPAMaxReplicasReached
expr: kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas
for: 5m
labels:
severity: warning
annotations:
summary: "HPA {{ $labels.horizontalpodautoscaler }} has reached maximum replicas"
- alert: ClusterAutoscalerErrors
expr: increase(cluster_autoscaler_errors_total[5m]) > 0
for: 2m
labels:
severity: critical
annotations:
summary: "Cluster Autoscaler experiencing errors"
Cost Optimization Strategies
Right-sizing with VPA
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: cost-optimized-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: background-worker
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: worker
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 500m
memory: 1Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
Spot Instance Integration
apiVersion: v1
kind: NodePool
metadata:
name: spot-worker-pool
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m4.large"]
limits:
resources:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
This comprehensive approach ensures your EKS applications scale efficiently while maintaining cost effectiveness and optimal performance.