Amazon EKS Deep Dive —Part 2 — Creating a cluster

8 min readOct 26, 2020

Over the past few years, Kubernetes have made a revolution in application development and become a standard for running containerized workloads. Amazon Web Services, the biggest cloud player, has facilitated this change by creating Amazon Elastic Kubernetes Service — a managed service that simplifies Kubernetes clusters' lifecycle.

This series describes the essential parts required to create an EKS cluster and manage your container environment successfully.

Previous part:
Amazon EKS Deep Dive — Part 1 — Six components of an effective Cloud-Native platform

Container Orchestration — a cornerstone of a cloud-native platform

This paper will guide you through the foundational layer of any cloud-native platform — a container orchestrator itself. We will describe different ways of deploying Amazon EKS and provide examples.

Before we begin, I will briefly describe why we need container orchestration, Kubernetes, and how Amazon Elastic Kubernetes service helps.

Why do we need container orchestration?

Let’s do a mental experiment to understand the benefits of container orchestration:

Imagine we have a virtual machine with Docker installed on it.

The first question we should ask is how to handle container failure and make sure that the application is always running without manual intervention. The solution to that maybe some monitoring tool and a script that restarts failed containers.

Now, let’s say we want to extend our setup and add one more docker host for high availability. At this point, it’s required to balance containers between two nodes, so we are updating our deployment script.

The next day, a new version of the application is released, and we need to update our deployment with the new version of the container image. The workflow consists of:

Starting a new container;
Making sure it is healthy;
Redirecting traffic to the new container;
Shutting down the old container.

These are not the only challenges that go hand in hand with running a containerized application. For each new problem, we would have to extend our script.

The more edge cases we consider, the more complicated the script becomes. Luckily. Google SREs has already gone through this journey and created a tool that solves all these challenges to our convenience.

It is called Kubernetes.

The benefit of Amazon EKS

Kubernetes environment consists of a control plane and a worker node pool. In this turn, the control plane consists of multiple components: (etcd, kube-scheduler, kube-apiserver, kube-controller-manager, cloud-controller-manager), and it takes a lot of effort to manage this complexity.

Amazon EKS is a managed service that simplifies provisioning and operating of Kubernetes clusters. In the case of Amazon EKS, we get an on-demand Kubernetes control plane managed by AWS, and we are only responsible for worker nodes.

There are several ways to provision Amazon EKS; let’s discuss some of them below.

Deploy EKS manually

Point-and-click in 2020 isn’t an option, face the fact and continue to the next options.

eksctl — The command-line interface for Amazon EKS

eksctl is a simple CLI tool for creating clusters on EKS. It is written in Golang and uses CloudFormation under the hood. It’s a good option to familiarize yourself with Amazon EKS or create a standalone environment for experimenting.

A basic cluster can be created with a single command:

eksctl create cluster

You can also customize the installation using yaml files:

apiVersion: eksctl.io/v1alpha5 
kind: ClusterConfig  metadata:
  name: basic-cluster
  region: eu-north-1  nodeGroups:   
  - name: ng-1
    instanceType: m5.large
    desiredCapacity: 10   
  - name: ng-2
    instanceType: m5.xlarge
    desiredCapacity: 2

Use the following command to create an Amazon EKS cluster from a yaml file:

eksctl create cluster -f cluster.yaml

More examples of how to use eksctl on the official website

Deploying EKS cluster using terraform

The only disadvantage of the previous method is that this eksctl hides too many things and may lack flexibility in certain cases. If we want maximum control, we will have to assemble a cluster on our own using, for example, terraform.

Below, the list of steps that are required to build an Amazon EKS cluster using terraform

1. Configure AWS provider and initialize remote backend. Nothing extraordinary at this point. Just boilerplate configuration that is trivial for any terraform project.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.0"
    }
  }
   backend "s3" {
    bucket         = "vlad.terraform"
    key            = "tfstate/"
    region         = "us-east-1"
    dynamodb_table = "terraform"
  }
} provider "aws" {
  region = "us-east-1"
}

2. Create a VPC. We will need at least two public subnets with an Internet gateway and two private subnets behind a NAT gateway. To simplify the process, we will use the existing module from terraform registry.

data "aws_availability_zones" "available" {}
 module "vpc" {
  source = "github.com/terraform-aws-modules/terraform-aws-vpc?ref=v2.62.0"
  name                 = "eks-deep-dive"  
  cidr                 = "10.0.0.0/16"
  azs                  = data.aws_availability_zones.available.names
  private_subnets      = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets       = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true
  
  public_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }
  
  private_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }
}

We are using a data source to fetch the list of availability zones.
More info about data sources:
https://www.terraform.io/docs/configuration/data-sources.html

Subnets are divided into six /24 CIDR blocks.
This layout is good for demo due to simplicity but likely not fit into a production use case.

To save on cost, we have created only one nat gateway. In production scenarios, enable high-availability by specifying the following:

single_nat_gateway = false

3. Create security groups.

resource "aws_security_group" "cluster" {
  name_prefix = var.cluster_name
  description = "EKS cluster security group."
  vpc_id      = module.vpc.vpc_id
  tags = {
    "Name" = "${var.cluster_name}-eks_cluster_sg"
  }
}

resource "aws_security_group" "workers" {
  name_prefix = var.cluster_name
  description = "Security group for all nodes in the cluster."
  vpc_id      = module.vpc.vpc_id
  tags = {
    "Name"                                      = "${var.cluster_name}-eks_worker_sg"
    "kubernetes.io/cluster/${var.cluster_name}" = "owned"
  }
}

resource "aws_security_group_rule" "cluster_egress_internet" {
  description       = "Allow cluster egress access to the Internet."
  protocol          = "-1"
  security_group_id = aws_security_group.cluster.id
  cidr_blocks       = ["0.0.0.0/0"]
  from_port         = 0
  to_port           = 0
  type              = "egress"
}

resource "aws_security_group_rule" "cluster_https_worker_ingress" {
  description              = "Allow pods to communicate with the EKS cluster API."
  protocol                 = "tcp"
  security_group_id        = aws_security_group.cluster.id
  source_security_group_id = aws_security_group.workers.id
  from_port                = 443
  to_port                  = 443
  type                     = "ingress"
}

resource "aws_security_group_rule" "workers_egress_internet" {
  description       = "Allow nodes all egress to the Internet."
  protocol          = "-1"
  security_group_id = aws_security_group.workers.id
  cidr_blocks       = ["0.0.0.0/0"]
  from_port         = 0
  to_port           = 0
  type              = "egress"
}

resource "aws_security_group_rule" "workers_ingress_self" {
  description              = "Allow node to communicate with each other."
  protocol                 = "-1"
  security_group_id        = aws_security_group.workers.id
  source_security_group_id = aws_security_group.workers.id
  from_port                = 0
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "workers_ingress_cluster" {
  description              = "Allow workers pods to receive communication from the cluster control plane."
  protocol                 = "tcp"
  security_group_id        = aws_security_group.workers.id
  source_security_group_id = aws_security_group.cluster.id
  from_port                = 1025
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "workers_ingress_cluster_kubelet" {
  description              = "Allow workers Kubelets to receive communication from the cluster control plane."
  protocol                 = "tcp"
  security_group_id        = aws_security_group.workers.id
  source_security_group_id = aws_security_group.cluster.id
  from_port                = 10250
  to_port                  = 10250
  type                     = "ingress"
}

resource "aws_security_group_rule" "workers_ingress_cluster_https" {
  description              = "Allow pods running extension API servers on port 443 to receive communication from cluster control plane."
  protocol                 = "tcp"
  security_group_id        = aws_security_group.workers.id
  source_security_group_id = aws_security_group.cluster.id
  from_port                = 443
  to_port                  = 443
  type                     = "ingress"
}

resource "aws_security_group_rule" "workers_ingress_cluster_primary" {
  description              = "Allow pods running on workers to receive communication from cluster primary security group (e.g. Fargate pods)."
  protocol                 = "all"
  security_group_id        = aws_security_group.workers.id
  source_security_group_id = aws_security_group.cluster.id
  from_port                = 0
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "cluster_primary_ingress_workers" {
  description              = "Allow pods running on workers to send communication to cluster primary security group (e.g. Fargate pods)."
  protocol                 = "all"
  security_group_id        = aws_security_group.cluster.id
  source_security_group_id = aws_security_group.workers.id
  from_port                = 0
  to_port                  = 65535
  type                     = "ingress"
}

4. Create IAM roles

resource "aws_iam_role" "cluster" {
  name_prefix           = var.cluster_name

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

data "aws_partition" "current" {}

locals {
  policy_arn_prefix = "arn:${data.aws_partition.current.partition}:iam::aws:policy"
}

resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
  policy_arn = "${local.policy_arn_prefix}/AmazonEKSClusterPolicy"
  role       = aws_iam_role.cluster.name
}

resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSServicePolicy" {
  policy_arn = "${local.policy_arn_prefix}/AmazonEKSServicePolicy"
  role       = aws_iam_role.cluster.name
}

resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSVPCResourceControllerPolicy" {
  policy_arn = "${local.policy_arn_prefix}/AmazonEKSVPCResourceController"
  role       = aws_iam_role.cluster.name
}

resource "aws_iam_role" "workers" {
  name = "${var.cluster_name}-workers"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "workers-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.workers.name
}

resource "aws_iam_role_policy_attachment" "workers-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.workers.name
}

resource "aws_iam_role_policy_attachment" "workers-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.workers.name
}

6. Create a control plane and a worker node pool

resource "aws_eks_cluster" "this" {
  name                      = var.cluster_name
  role_arn                  = aws_iam_role.cluster.arn

  vpc_config {
    security_group_ids      = compact([aws_security_group.cluster.id])
    subnet_ids              = module.vpc.private_subnets
    
    endpoint_public_access  = true
    endpoint_private_access = true
    public_access_cidrs     = ["73.126.49.102/32"]
  }

  depends_on = [
    aws_security_group_rule.cluster_egress_internet,
    aws_security_group_rule.cluster_https_worker_ingress,
    aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
    aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
    aws_iam_role_policy_attachment.cluster_AmazonEKSVPCResourceControllerPolicy,
  ]
}

resource "aws_eks_node_group" "workers" {
  cluster_name    = aws_eks_cluster.this.name
  node_group_name = "${var.cluster_name}-workers"
  node_role_arn   = aws_iam_role.workers.arn
  subnet_ids      = module.vpc.private_subnets

  scaling_config {
    desired_size = 3
    max_size     = 3
    min_size     = 3
  }

  # Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
  # Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
  depends_on = [
    aws_iam_role_policy_attachment.workers-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.workers-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.workers-AmazonEC2ContainerRegistryReadOnly,
  ]
}

At this point, our cluster is ready, and we can connect to it.
To obtain kubeconfig, use the following command:

aws eks update-kubeconfig --name ${your_cluster_name}

The full terraform template is available on GitHub.

Simplifying deployment using a terraform-aws-eks module

We can omit codifying roles, permissions, and security groups by using terraform-aws-eks created and maintained by the open-source community.

Instead of almost 200 lines of code, we can shrink our deployment manifest down to 70 lines. The code below creates roles, security groups, a control plane, and a worker node group:

module "eks" {
  source = "github.com/terraform-aws-modules/terraform-aws-eks?ref=v12.2.0"

  cluster_name = var.cluster_name
  subnets      = module.vpc.private_subnets

  vpc_id = module.vpc.vpc_id

  worker_groups = [
    {
      name                 = "worker-group-1"
      instance_type        = "t2.medium"
      asg_desired_capacity = 3
      tags = [
        {
          "key"                 = "k8s.io/cluster-autoscaler/${var.cluster_name}",
          "value"               = "owned",
          "propagate_at_launch" = "true"
        }
      ]
    }
  ]
}

You got this

In this article, we have learned different ways to create an EKS cluster. Use any option as long as it is efficient for your use case. The next article will cover the log management part of the observability stack in a cloud-native platform.

See a bigger picture in a previous post: Amazon EKS Deep Dive — Part 1 — Six components of an effective Cloud-Native platform.

If you want to share your opinion, connect with me on Linkedin