May 21, 2025

How to Create EKS Cluster with Terraform

Steps I take to set up an EKS cluster with Terraform. This covers the full stack: VPC networking, the EKS control plane, node groups, IAM and RBAC, autoscaling, load balancers, ingress controllers, TLS with cert-manager, persistent storage with EBS and EFS, and secrets management.

Each part builds on the previous one, so it's meant to be followed in order. Everything is done with Terraform, and I'll explain why each piece is configured the way it is.

Prerequisites

An AWS account with permissions to create VPCs, EKS clusters, IAM roles, and related resources
Terraform v1.3+ installed locally
AWS CLI v2.7.0+ installed and configured
kubectl v1.24.0+
Basic familiarity with Kubernetes concepts (pods, deployments, services)

Goals

By the end of this guide, you'll have:

A production-style VPC with public and private subnets across multiple availability zones
An EKS cluster with managed node groups
IAM users and roles mapped to Kubernetes RBAC
Horizontal Pod Autoscaler and Cluster Autoscaler configured
AWS Load Balancer Controller with NLB and ALB support
NGINX Ingress Controller with TLS via cert-manager
Persistent storage using EBS and EFS CSI drivers
AWS Secrets Manager integrated with your pods

Note: AWS EKS clusters cost $0.10 per hour for the control plane, plus EC2 costs for worker nodes. Make sure to tear everything down when you're done testing.

Part 1: VPC and Networking

Starting from scratch with Terraform. The pattern is: define locals for env name, region, EKS cluster name and version, then build everything on top of that.

What gets created:

1 VPC
2 private subnets (for worker nodes) + 2 public subnets (for load balancers), each pair spread across 2 AZs
Internet Gateway attached to the VPC
NAT Gateway placed in a public subnet (allocate an Elastic IP for it manually so it's static, useful if clients need to whitelist an IP)
Public route table with a default route to the Internet Gateway
Private route table with a default route to the NAT Gateway
Route table associations for all 4 subnets

Subnet Tags EKS Needs

EKS uses specific tags on subnets to know where to place load balancers:

Private subnets: kubernetes.io/role/internal-elb = 1 (for internal load balancers)
Public subnets: kubernetes.io/role/elb = 1 (for public-facing load balancers)
Both: = owned (or shared if multiple clusters use the same subnets)

EKS requires subnets in at least 2 AZs. It creates cross-account ENIs in these subnets to connect workers to the control plane.

NAT Gateway Considerations

One NAT Gateway is usually fine. Multiple NATs per AZ is something people do, but in practice it's rarely worth the cost unless you have massive cross-AZ data transfer concerns.

Note: In production, use remote state in S3, not local. Use IAM roles with short-lived tokens instead of access keys.

Part 2: EKS Cluster and Node Groups

Kubernetes Control Plane Components (Good to Remember)

Before diving into the Terraform config, a quick refresher on what the control plane is actually running:

etcd stores all cluster state. Back it up.
scheduler assigns pods to nodes based on CPU/memory requests.
controller manager runs the reconciliation loop, keeping current state equal to desired state.
cloud controller manager (legacy) handled cloud-specific logic, now mostly deprecated in favor of external controllers.
API server is stateless, handles auth/authz, and is the entry point for kubectl.

On the worker side, each node runs kubelet (runs containers per Pod spec) and kube-proxy (network proxy for services).

IAM for EKS

The EKS control plane needs an IAM role with AmazonEKSClusterPolicy attached. The trust policy should be set to eks.amazonaws.com.

Worker nodes need a separate IAM role (trust: ec2.amazonaws.com) with three policies:

AmazonEKSWorkerNodePolicy
AmazonEKS_CNI_Policy, which manages secondary IPs for pods (native AWS IPs, no overlay network like Flannel needed)
AmazonEC2ContainerRegistryReadOnly to pull images from ECR

EKS Cluster Terraform Notes

A few things to keep in mind when writing the cluster resource:

Use access_config with authentication_mode = "API". This is much easier than managing the old aws-auth ConfigMap.
Explicitly grant the Terraform user cluster admin permissions with bootstrap_cluster_creator_admin_permissions = true. You'll need this if you're deploying Helm charts via Terraform.
Nodes go in private subnets only, no public IPs needed.

Node Group Config

I use EKS-managed node groups (not self-managed or Fargate) because they're the easiest to upgrade.

For capacity_type, choose between ON_DEMAND and SPOT. Spot is cheaper but instances can be reclaimed anytime, so it's best suited for batch or streaming jobs that support savepoints.

Define min_size, max_size, and desired_size. After initial creation, the Cluster Autoscaler will manage desired_size, so add it to ignore_changes in Terraform to avoid conflicts:

lifecycle {
  ignore_changes = [scaling_config[0].desired_size]
}

Use custom labels (role = general) instead of relying on built-in node group labels. This makes it easier when you need to migrate apps between node groups.

Note on spreading across AZs: Not always worth it. Cross-AZ data transfer costs can be brutal at scale (e.g., Kafka consumers in different AZs from brokers). Many companies stick to single-AZ clusters for the savings.

Configuring kubectl

After the cluster is created, configure kubectl to interact with it:

aws eks --region <region> update-kubeconfig --name <cluster-name>

Verify connectivity:

kubectl cluster-info
kubectl get nodes

You should see all your worker nodes in Ready status.

Part 3: IAM Users, Roles and RBAC

The flow for setting up access is:

Create Kubernetes RBAC roles or cluster roles
Create ClusterRoleBindings to custom RBAC groups
Map AWS IAM users or roles to those RBAC groups via the EKS API (not the deprecated aws-auth ConfigMap)

Viewer (Read-Only) Example

First, create a ClusterRole called viewer with get/list/watch permissions on common resources:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: viewer
rules:
  - apiGroups: [""]
    resources: ["pods", "services", "configmaps", "secrets"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources: ["deployments", "statefulsets"]
    verbs: ["get", "list", "watch"]

Then bind it to a custom group:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: viewer-binding
subjects:
  - kind: Group
    name: my-viewer
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: viewer
  apiGroup: rbac.authorization.k8s.io

On the AWS side, create an IAM user with minimal EKS policy (just enough to update kubeconfig and connect), then create an EKS access entry binding that IAM user to the my-viewer group.

Admin Example (Preferred Pattern: Use Roles, Not Users)

For admin access, bind the built-in cluster-admin role to a custom group. You can't use system: groups in EKS API access entries, so create your own:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-binding
subjects:
  - kind: Group
    name: my-admin
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

Then set up an IAM role eks-admin with a trust policy allowing specific users or accounts to assume it. The manager IAM user gets a policy allowing sts:AssumeRole on the eks-admin role. Finally, create an EKS access entry binding the IAM role to the my-admin group.

For the local AWS CLI, configure a profile using role_arn and source_profile so it auto-assumes the role.

Note: Delete IAM access keys manually from the console before running terraform destroy. Terraform can't delete users with active keys.

Part 4: HPA and Metrics Server

HPA (Horizontal Pod Autoscaler) scales pods based on CPU or memory usage. Two things are required for it to work:

resources.requests must be defined on the deployment. HPA uses requests, not limits, to calculate utilization percentage.
Metrics Server must be installed in the cluster.

Important HPA Gotcha with GitOps

Don't put replicas in the deployment manifest when using HPA with GitOps tools like ArgoCD or FluxCD. What happens is a race condition: GitOps keeps setting replicas back to whatever is in the manifest, while HPA keeps trying to scale up. Remove the replicas field entirely and let HPA manage it.

Installing Metrics Server

Deploy via Helm. It's basically set-and-forget:

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm install metrics-server metrics-server/metrics-server -n kube-system

Verify it's working:

kubectl top pods
kubectl top nodes

HPA targets the deployment by name, not by labels. Scale-down happens after approximately 5 minutes of low load by default (configurable via --horizontal-pod-autoscaler-downscale-stabilization).

Part 5: EKS Pod Identities and Cluster Autoscaler

Pod Identities vs OIDC

The old way to give pods AWS permissions was through OpenID Connect: create an OIDC provider, create an IAM role, then annotate the service account with the role ARN. It works, but the annotation is easy to forget and the trust policy is tightly coupled to namespace and service account names.

The new way is Pod Identities:

Install the eks-pod-identity-agent addon as a DaemonSet
Create an IAM role with trust to pods.eks.amazonaws.com (same trust policy works for all apps, no need to specify namespace or service account)
Use pod_identity_association resource to bind the role to a specific namespace and service account

No annotation on the service account needed anymore. It's cleaner and less error-prone.

Cluster Autoscaler

The Cluster Autoscaler adjusts the desired_size of the EC2 Auto Scaling Group based on pending pods. When pods can't be scheduled because there aren't enough resources, the autoscaler adds nodes. When nodes are underutilized, it removes them.

It needs IAM permissions to modify ASGs. Use pod identities for this.

If pods stay pending and autoscaling doesn't trigger, check the autoscaler logs first:

kubectl logs -l app.kubernetes.io/name=cluster-autoscaler -n kube-system

Also make sure desired_size is in ignore_changes in your Terraform config. Otherwise Terraform and the autoscaler fight each other on every terraform apply.

Part 6: AWS Load Balancer Controller

Why Install It

The legacy in-tree cloud controller creates classic load balancers using NodePorts. All workers get added to the target group, which means a 500 node limit and an extra network hop to reach the pod.

The AWS Load Balancer Controller creates NLB (layer 4) or ALB (layer 7) with IP mode, where pod IPs are added directly to the target group. No NodePort, no extra hop, no node limit.

Key Annotations for Services

When creating a Kubernetes Service of type LoadBalancer, use these annotations to tell AWS LBC what to create:

metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"

aws-load-balancer-type: "external" tells Kubernetes to use AWS LBC instead of the in-tree controller
nlb-target-type: "ip" enables IP mode (not instance mode)
scheme controls whether the LB is internet-facing or internal

IAM permissions via pod identities, same pattern as the autoscaler.

ALB Ingress (via AWS LBC)

The controller can also create an ALB per Ingress resource:

TLS gets terminated at the ALB using a certificate from ACM, no cert stored in Kubernetes needed
Set ingressClassName: alb
Use the alb.ingress.kubernetes.io/actions.ssl-redirect annotation for HTTPS redirect

To test without DNS:

curl -H "Host: myapp.example.com" http://<alb-dns>/

Note: One ALB per app gets expensive at scale. The recommended pattern is to use AWS LBC to create an NLB for NGINX Ingress, then use NGINX for routing. That way you share a single load balancer across all your apps.

Part 7: NGINX Ingress Controller

The architecture is: NLB (created by AWS LBC in IP mode) → NGINX pods → app pods.

NGINX handles:

Layer 7 routing (host, path, HTTP verbs)
TLS termination (cert + key stored in Kubernetes secret)
Prometheus metrics from a single place for all apps
Custom TCP/UDP service routing (share one LB for multiple services)

Two Ingress Controllers Pattern

A common setup is to run two separate NGINX installations:

external-nginx with an internet-facing NLB for public apps
internal-nginx with an internal NLB for dashboards like Grafana and Prometheus, paired with Private Route53 and a Client VPN

This keeps your internal tools off the public internet without extra complexity.

cert-manager for TLS

cert-manager automates Let's Encrypt certificates. Certs are valid for 90 days and get renewed automatically around 60 days.

There are two challenge types:

HTTP-01 is easier to configure, but you need live DNS pointing to the ingress first.
DNS-01 is preferred in production because it works before DNS cutover. It needs IAM permissions for Route53 to create TXT records.

When debugging certificate issues, follow the chain: Certificate → CertificateRequest → Order → Challenge. Run kubectl describe on each in sequence to find where the error is.

Create a ClusterIssuer with your email so cert-manager sends you warnings before renewal failures.

Part 8: EBS CSI Driver

You need the EBS CSI Driver to run StatefulSets that use EBS volumes. Without it, pods stay pending with a "driver missing" error.

Key things to know:

Access mode is ReadWriteOnce, meaning one pod per volume (technically one node).
The default storage class is gp2. I recommend creating a custom gp3 class with allowVolumeExpansion: true:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

For volume expansion, edit the PVC manually and the CSI driver handles it. You can't change a StatefulSet's volume template (it's immutable), but in emergencies you can edit the PV directly.

Deploy as a managed EKS addon:

aws eks create-addon --cluster-name <cluster-name> --addon-name aws-ebs-csi-driver \
  --service-account-role-arn <role-arn>

IAM via pod identities with the AmazonEBSCSIDriverPolicy plus an optional KMS encryption policy if you encrypt your volumes.

Part 9: EFS CSI Driver

Use EFS when you need ReadWriteMany, which means multiple pods reading and writing the same volume at once. This is common for shared config files, CMS uploads, or ML model directories.

EFS is elastic, so there's no capacity planning needed. It auto-scales. But it's significantly more expensive than EBS, so only use it when you actually need shared access.

Setup Steps

Create an EFS file system
Create mount targets in each private subnet (one per AZ)
Allow traffic from the EKS security group to EFS on NFS port 2049
Create a storage class using the EFS provisioner
Create PVCs as needed. PVCs don't actually respect the size you put (EFS ignores it), but Kubernetes requires a value, so just put something like 5Gi

Note: The EFS CSI driver doesn't support Pod Identities yet. Use the OIDC provider with an annotated service account for now.

Check driver logs on first deploy. EFS config errors tend to be subtle: wrong security group, wrong subnet, or mount target not ready yet.

Part 10: Secrets Manager Integration

This setup mounts AWS Secrets Manager secrets into pods as files or environment variables.

Components

Two Helm charts are needed:

secrets-store-csi-driver (generic, not AWS-specific). Install with syncSecret.enabled: true if you need env var support.
secrets-store-csi-driver-provider-aws (the AWS-specific provider).

How It Works

Store a secret in Secrets Manager as key-value pairs (JSON blob)
Create a SecretProviderClass with JMESPath expressions to parse individual keys from the JSON
Create a Kubernetes Secret that references those parsed keys (needed if you want env var mounting)
Annotate the service account with the IAM role ARN
Mount as a volume in the deployment (required, even if you only care about env vars)
Reference the Kubernetes Secret as envFrom or individual env entries

The IAM role should be per-application, not per-cluster. Grant access only to the specific secret ARN:

{
  "Effect": "Allow",
  "Action": [
    "secretsmanager:GetSecretValue",
    "secretsmanager:DescribeSecret"
  ],
  "Resource": "arn:aws:secretsmanager:<region>:<account>:secret:<secret-name>-*"
}

Don't use a wildcard for the resource in production.

This component uses OIDC (not pod identities yet). The trust policy specifies the namespace and service account.

To verify everything is mounted correctly:

kubectl exec -it <pod> -- cat /mnt/secrets/username
kubectl exec -it <pod> -- env | grep MY_

General Things to Remember

Always check addon and chart versions with aws eks describe-addon-versions
Use remote Terraform state in S3, not local
Use IAM roles with short-lived tokens wherever possible, not access keys
Delete access keys manually before terraform destroy
Quick RBAC checks: kubectl auth can-i get pods and kubectl auth can-i '*' '*'
In large multi-team clusters, use namespace resource quotas to prevent one team from starving others
Don't expose internal dashboards (Grafana, Prometheus) with internet-facing load balancers

Conclusion

Here's a summary of what each part covers and the key tool or pattern used:

Part	Topic	Key Component
1	VPC and Networking	VPC, subnets, NAT Gateway, subnet tags
2	EKS Cluster and Node Groups	EKS managed node groups, IAM roles
3	IAM Users, Roles and RBAC	EKS access entries, ClusterRoleBindings
4	HPA and Metrics Server	Horizontal Pod Autoscaler, Metrics Server
5	Pod Identities and Cluster Autoscaler	Pod identity agent, ASG scaling
6	AWS Load Balancer Controller	NLB/ALB in IP mode
7	NGINX Ingress Controller	NGINX + cert-manager + Let's Encrypt
8	EBS CSI Driver	ReadWriteOnce, gp3 storage class
9	EFS CSI Driver	ReadWriteMany, elastic shared storage
10	Secrets Manager Integration	Secrets Store CSI Driver + AWS provider

The full Terraform code for each part is available in the companion repository. Each part builds on the previous one, so the state file grows incrementally as you add components.