Amazon EKS Deep Dive —Part 2 — Creating a cluster
Over the past few years, Kubernetes have made a revolution in application development and become a standard for running containerized workloads. Amazon Web Services, the biggest cloud player, has facilitated this change by creating Amazon Elastic Kubernetes Service — a managed service that simplifies Kubernetes clusters' lifecycle.
This series describes the essential parts required to create an EKS cluster and manage your container environment successfully.
Previous part:
Amazon EKS Deep Dive — Part 1 — Six components of an effective Cloud-Native platform
Container Orchestration — a cornerstone of a cloud-native platform
This paper will guide you through the foundational layer of any cloud-native platform — a container orchestrator itself. We will describe different ways of deploying Amazon EKS and provide examples.
Before we begin, I will briefly describe why we need container orchestration, Kubernetes, and how Amazon Elastic Kubernetes service helps.
Why do we need container orchestration?
Let’s do a mental experiment to understand the benefits of container orchestration:
Imagine we have a virtual machine with Docker installed on it.
The first question we should ask is how to handle container failure and make sure that the application is always running without manual intervention. The solution to that maybe some monitoring tool and a script that restarts failed containers.
Now, let’s say we want to extend our setup and add one more docker host for high availability. At this point, it’s required to balance containers between two nodes, so we are updating our deployment script.
The next day, a new version of the application is released, and we need to update our deployment with the new version of the container image. The workflow consists of:
- Starting a new container;
- Making sure it is healthy;
- Redirecting traffic to the new container;
- Shutting down the old container.
These are not the only challenges that go hand in hand with running a containerized application. For each new problem, we would have to extend our script.
The more edge cases we consider, the more complicated the script becomes. Luckily. Google SREs has already gone through this journey and created a tool that solves all these challenges to our convenience.
It is called Kubernetes.
The benefit of Amazon EKS
Kubernetes environment consists of a control plane and a worker node pool. In this turn, the control plane consists of multiple components: (etcd, kube-scheduler, kube-apiserver, kube-controller-manager, cloud-controller-manager), and it takes a lot of effort to manage this complexity.
Amazon EKS is a managed service that simplifies provisioning and operating of Kubernetes clusters. In the case of Amazon EKS, we get an on-demand Kubernetes control plane managed by AWS, and we are only responsible for worker nodes.
There are several ways to provision Amazon EKS; let’s discuss some of them below.
Deploy EKS manually
Point-and-click in 2020 isn’t an option, face the fact and continue to the next options.
eksctl — The command-line interface for Amazon EKS
eksctl is a simple CLI tool for creating clusters on EKS. It is written in Golang and uses CloudFormation under the hood. It’s a good option to familiarize yourself with Amazon EKS or create a standalone environment for experimenting.
A basic cluster can be created with a single command:
eksctl create cluster
You can also customize the installation using yaml files:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig metadata:
name: basic-cluster
region: eu-north-1 nodeGroups:
- name: ng-1
instanceType: m5.large
desiredCapacity: 10
- name: ng-2
instanceType: m5.xlarge
desiredCapacity: 2
Use the following command to create an Amazon EKS cluster from a yaml file:
eksctl create cluster -f cluster.yaml
More examples of how to use eksctl on the official website
Deploying EKS cluster using terraform
The only disadvantage of the previous method is that this eksctl hides too many things and may lack flexibility in certain cases. If we want maximum control, we will have to assemble a cluster on our own using, for example, terraform.
Below, the list of steps that are required to build an Amazon EKS cluster using terraform
1. Configure AWS provider and initialize remote backend. Nothing extraordinary at this point. Just boilerplate configuration that is trivial for any terraform project.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.0"
}
}
backend "s3" {
bucket = "vlad.terraform"
key = "tfstate/"
region = "us-east-1"
dynamodb_table = "terraform"
}
} provider "aws" {
region = "us-east-1"
}
2. Create a VPC. We will need at least two public subnets with an Internet gateway and two private subnets behind a NAT gateway. To simplify the process, we will use the existing module from terraform registry.
data "aws_availability_zones" "available" {}
module "vpc" {
source = "github.com/terraform-aws-modules/terraform-aws-vpc?ref=v2.62.0"
name = "eks-deep-dive"
cidr = "10.0.0.0/16"
azs = data.aws_availability_zones.available.names
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
public_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
private_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
We are using a data source to fetch the list of availability zones.
More info about data sources:
https://www.terraform.io/docs/configuration/data-sources.html
Subnets are divided into six /24 CIDR blocks.
This layout is good for demo due to simplicity but likely not fit into a production use case.
To save on cost, we have created only one nat gateway. In production scenarios, enable high-availability by specifying the following:
single_nat_gateway = false
3. Create security groups.
resource "aws_security_group" "cluster" {
name_prefix = var.cluster_name
description = "EKS cluster security group."
vpc_id = module.vpc.vpc_id
tags = {
"Name" = "${var.cluster_name}-eks_cluster_sg"
}
}
resource "aws_security_group" "workers" {
name_prefix = var.cluster_name
description = "Security group for all nodes in the cluster."
vpc_id = module.vpc.vpc_id
tags = {
"Name" = "${var.cluster_name}-eks_worker_sg"
"kubernetes.io/cluster/${var.cluster_name}" = "owned"
}
}
resource "aws_security_group_rule" "cluster_egress_internet" {
description = "Allow cluster egress access to the Internet."
protocol = "-1"
security_group_id = aws_security_group.cluster.id
cidr_blocks = ["0.0.0.0/0"]
from_port = 0
to_port = 0
type = "egress"
}
resource "aws_security_group_rule" "cluster_https_worker_ingress" {
description = "Allow pods to communicate with the EKS cluster API."
protocol = "tcp"
security_group_id = aws_security_group.cluster.id
source_security_group_id = aws_security_group.workers.id
from_port = 443
to_port = 443
type = "ingress"
}
resource "aws_security_group_rule" "workers_egress_internet" {
description = "Allow nodes all egress to the Internet."
protocol = "-1"
security_group_id = aws_security_group.workers.id
cidr_blocks = ["0.0.0.0/0"]
from_port = 0
to_port = 0
type = "egress"
}
resource "aws_security_group_rule" "workers_ingress_self" {
description = "Allow node to communicate with each other."
protocol = "-1"
security_group_id = aws_security_group.workers.id
source_security_group_id = aws_security_group.workers.id
from_port = 0
to_port = 65535
type = "ingress"
}
resource "aws_security_group_rule" "workers_ingress_cluster" {
description = "Allow workers pods to receive communication from the cluster control plane."
protocol = "tcp"
security_group_id = aws_security_group.workers.id
source_security_group_id = aws_security_group.cluster.id
from_port = 1025
to_port = 65535
type = "ingress"
}
resource "aws_security_group_rule" "workers_ingress_cluster_kubelet" {
description = "Allow workers Kubelets to receive communication from the cluster control plane."
protocol = "tcp"
security_group_id = aws_security_group.workers.id
source_security_group_id = aws_security_group.cluster.id
from_port = 10250
to_port = 10250
type = "ingress"
}
resource "aws_security_group_rule" "workers_ingress_cluster_https" {
description = "Allow pods running extension API servers on port 443 to receive communication from cluster control plane."
protocol = "tcp"
security_group_id = aws_security_group.workers.id
source_security_group_id = aws_security_group.cluster.id
from_port = 443
to_port = 443
type = "ingress"
}
resource "aws_security_group_rule" "workers_ingress_cluster_primary" {
description = "Allow pods running on workers to receive communication from cluster primary security group (e.g. Fargate pods)."
protocol = "all"
security_group_id = aws_security_group.workers.id
source_security_group_id = aws_security_group.cluster.id
from_port = 0
to_port = 65535
type = "ingress"
}
resource "aws_security_group_rule" "cluster_primary_ingress_workers" {
description = "Allow pods running on workers to send communication to cluster primary security group (e.g. Fargate pods)."
protocol = "all"
security_group_id = aws_security_group.cluster.id
source_security_group_id = aws_security_group.workers.id
from_port = 0
to_port = 65535
type = "ingress"
}
4. Create IAM roles
resource "aws_iam_role" "cluster" {
name_prefix = var.cluster_name
assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}]
Version = "2012-10-17"
})
}
data "aws_partition" "current" {}
locals {
policy_arn_prefix = "arn:${data.aws_partition.current.partition}:iam::aws:policy"
}
resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
policy_arn = "${local.policy_arn_prefix}/AmazonEKSClusterPolicy"
role = aws_iam_role.cluster.name
}
resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSServicePolicy" {
policy_arn = "${local.policy_arn_prefix}/AmazonEKSServicePolicy"
role = aws_iam_role.cluster.name
}
resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSVPCResourceControllerPolicy" {
policy_arn = "${local.policy_arn_prefix}/AmazonEKSVPCResourceController"
role = aws_iam_role.cluster.name
}
resource "aws_iam_role" "workers" {
name = "${var.cluster_name}-workers"
assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
Version = "2012-10-17"
})
}
resource "aws_iam_role_policy_attachment" "workers-AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.workers.name
}
resource "aws_iam_role_policy_attachment" "workers-AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.workers.name
}
resource "aws_iam_role_policy_attachment" "workers-AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.workers.name
}
6. Create a control plane and a worker node pool
resource "aws_eks_cluster" "this" {
name = var.cluster_name
role_arn = aws_iam_role.cluster.arn
vpc_config {
security_group_ids = compact([aws_security_group.cluster.id])
subnet_ids = module.vpc.private_subnets
endpoint_public_access = true
endpoint_private_access = true
public_access_cidrs = ["73.126.49.102/32"]
}
depends_on = [
aws_security_group_rule.cluster_egress_internet,
aws_security_group_rule.cluster_https_worker_ingress,
aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
aws_iam_role_policy_attachment.cluster_AmazonEKSVPCResourceControllerPolicy,
]
}
resource "aws_eks_node_group" "workers" {
cluster_name = aws_eks_cluster.this.name
node_group_name = "${var.cluster_name}-workers"
node_role_arn = aws_iam_role.workers.arn
subnet_ids = module.vpc.private_subnets
scaling_config {
desired_size = 3
max_size = 3
min_size = 3
}
# Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
# Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
depends_on = [
aws_iam_role_policy_attachment.workers-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.workers-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.workers-AmazonEC2ContainerRegistryReadOnly,
]
}
At this point, our cluster is ready, and we can connect to it.
To obtain kubeconfig, use the following command:
aws eks update-kubeconfig --name ${your_cluster_name}
The full terraform template is available on GitHub.
Simplifying deployment using a terraform-aws-eks module
We can omit codifying roles, permissions, and security groups by using terraform-aws-eks created and maintained by the open-source community.
Instead of almost 200 lines of code, we can shrink our deployment manifest down to 70 lines. The code below creates roles, security groups, a control plane, and a worker node group:
module "eks" {
source = "github.com/terraform-aws-modules/terraform-aws-eks?ref=v12.2.0"
cluster_name = var.cluster_name
subnets = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
worker_groups = [
{
name = "worker-group-1"
instance_type = "t2.medium"
asg_desired_capacity = 3
tags = [
{
"key" = "k8s.io/cluster-autoscaler/${var.cluster_name}",
"value" = "owned",
"propagate_at_launch" = "true"
}
]
}
]
}
You got this
In this article, we have learned different ways to create an EKS cluster. Use any option as long as it is efficient for your use case. The next article will cover the log management part of the observability stack in a cloud-native platform.
See a bigger picture in a previous post: Amazon EKS Deep Dive — Part 1 — Six components of an effective Cloud-Native platform.
If you want to share your opinion, connect with me on Linkedin