Creating EC2 based ECS cluster with Terraform

10 min readJun 9, 2023

In this article, we will see how to set up a secure EC2 based ECS cluster with the help of IaaC (Terraform).

Terraform

Terraform is an open-source infrastructure as a code software tool that helps to deploy infrastructure easily using declarative language(HCL). It allows to write each part of infrastructure once, provides features of rollbacks as well as versioning. By placing the state file remotely multiple team members can work on the same project.

Amazon ECS

As you are already familiar with some orchestration tools like ( K8s, DockerSwarm), AWS also come with their own similar service (Elastic Container Service) with some more advantages. In this, AWS uses multiple services that are integrated like EC2,cloudwatch,autoscaling_groups and uses the docker as a containerized tool.

Below Diagram shows the high-level architecture of our cluster.

Here we have to set up the VPC according to the needs, autoscaling group always ensures that desired number of instances is always up. ECS service maintains the container inside these instances which contains the application.

The services which are to be provisioned in infrastructure

VPC which contains (subnets,internet_gateway,nat_gateway,route_tables)
Security_groups (filter out the traffic)
Launch configuration (contains metadata of an instance to be launched)
Autoscaling group (make desired instances up)
Application Load Balancer (helps to distribute the traffic)
Roles ( permissions to be given for interaction with another service)
ECS cluster
ECS task_definitions
ECS service

Let’s start building one by one with the help of terraform

Provider

Before moving ahead lets set the providers, if aws cli is already configured with credentials then simply provide the profile and region name.

provider "aws" {
    profile = "profile name"
    region = var.region
  
}

VPC

Amazon Virtual Private Cloud (VPC) gives you complete control over your virtual networking environment including resource placement, connectivity, and security.

resource "aws_vpc" "this" {
    cidr_block = var.vpccidr
    instance_tenancy = var.tenancy                            ~This will create a VPC in your account.
    tags = {
      "Name" = "Ecs-Vpc"
    }
  
}

resource "aws_subnet" "private" {
    count = length(var.azs)
    availability_zone = element(var.azs,count.index)
    vpc_id = aws_vpc.this.id                                  ~ creates desired number of private subnet within a given range of CIDR.
    cidr_block = element(var.prvcidr,count.index)
tags = {
    "Name" = "private-${count.index+1}"
}
  
}
resource "aws_subnet" "public" {
    count = length(var.azs)
    availability_zone = element(var.azs,count.index)
    vpc_id = aws_vpc.this.id                                  ~ create the desired number of public subnets within a given range of CIDR.
    cidr_block = element(var.pubcidr,count.index)
    map_public_ip_on_launch = true
  
tags = {
  "Name" = "public-${count.index+1}"
}
}


resource "aws_internet_gateway" "gw" {
    depends_on = [                                            
      aws_vpc.this
    ]
    vpc_id = aws_vpc.this.id

tags = {
      "Name" = "Ecs-gw"
    }
}

resource "aws_route_table" "public" {
    depends_on = [
      aws_internet_gateway.gw
    ]
    vpc_id = aws_vpc.this.id
    route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.gw.id
}
    
    tags = {
      "Name" = "pub-routes"
    }   
}


resource "aws_route_table" "private" {
    vpc_id = aws_vpc.this.id
    tags = {
      "Name" = "private"
    }
}

resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public.*.id)
  subnet_id = element(aws_subnet.public.*.id, count.index)
  route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
  count = length(aws_subnet.private.*.id)
  subnet_id = element(aws_subnet.private.*.id, count.index)
  route_table_id = aws_route_table.private.id
  
}


resource "aws_eip" "ecs_eip" {
  vpc = true
}
resource "aws_nat_gateway" "ecs_nat" {
    depends_on = [
      aws_subnet.private
    ]
  allocation_id = aws_eip.ecs_eip.id
  subnet_id = aws_subnet.public.0.id

}

resource "aws_route" "nat_route" {
  route_table_id = aws_route_table.private.id
  nat_gateway_id = aws_nat_gateway.ecs_nat.id
  destination_cidr_block = "0.0.0.0/0"
}

~This will create a VPC in your account.

~ creates desired number of private subnet within a given range of CIDR.

~ create the desired number of public subnets within a given range of CIDR.

~ create internet gateway

~ create a public route that helps public subnet have local as well as internet connectivity.

~create a private route that restricts internet connectivity, hence all private subnets are associated with it, below this rules of routes defined.

~ create an elastic IP for the nat gateway

~ create nat gateway (allows instance generated data to go to the internet but restricts the outside data. Helps private instances to securely communicate with ECS service.)

~ create routes for nat gateway

Security Groups

Security groups act as firewall which helps to filter out the traffic according to the rules. So, here we create sg for our instances as well as for our Load Balancer so that we can control the flow of traffic.

resource "aws_security_group" "ecs_ec2_sg" {
  name        = "ecs_ec2_sg"
  description = "Allow TLS inbound traffic"
  vpc_id      = aws_vpc.this.id

  ingress {
    
      description      = "TLS from VPC"
      from_port        = 0
      to_port          = 0
      protocol         = "-1"
      security_groups  = [aws_security_group.ecs_alb_sg.id]
        
  }

    ingress {
    
      description      = "TLS from VPC"
      from_port        = 22
      to_port          = 22
      protocol         = "tcp"
      cidr_blocks      = ["0.0.0.0/0"]
        
  }
  

  egress {
    
      from_port        = 0
      to_port          = 0
      protocol         = "-1"
      cidr_blocks      = ["0.0.0.0/0"]

    
    }
  


}
resource "aws_security_group" "ecs_alb_sg" {
  name        = "ecs_alb_sg"
  description = "Allow TLS inbound traffic"
  vpc_id      = aws_vpc.this.id

  ingress {
    
      description      = "TLS from VPC"
      from_port        = 80
      to_port          = 80
      protocol         = "tcp"
      cidr_blocks      = ["0.0.0.0/0"]
        
  }

  

  egress {
    
      from_port        = 0
      to_port          = 0
      protocol         = "-1"
      cidr_blocks      = ["0.0.0.0/0"]

    
    }
  


}

~ingress rule of instance allows the load balancer to hit on any port of the instances because each time container got diff port. So we can’t decide which ports the new container holds

~open port 22 for all for ssh, but the best practice is that give a fixed IP of your bastion host.

~ all traffic generated by the instance can go out.

~ ingress rule of load balancer will allow all the traffic on the internet can hit on port 80 of the load balancer.

~ all traffic from load balancer can go out

Launch Configuration

A launch configuration is an instance configuration template that an Auto Scaling group uses to launch EC2 instances. Include the ID of the Amazon Machine Image (AMI), the instance type, a key pair, one or more security groups, and a block device mapping.

Here we use the latest ECS optimized AMI which has inbuilt features of ECS, we just have to modify the config file according to our needs. The file userdata.sh contains the config file.

resource "aws_launch_configuration" "ecs_launch_config" {
    image_id             = var.ecs_ami
    iam_instance_profile = aws_iam_instance_profile.ecs_role.name
    security_groups      = [aws_security_group.ecs_ec2_sg.id]
    user_data            = templatefile("userdata.sh",
    {
      clustername = var.clusterName
      region      = var.region
    })
    instance_type        = "t2.micro"
    key_name             = var.key_name
}

~here we have to mention the detail of instances like ami, instance_type,key_pair,user_data, roles. So, it will be easy to launch instances directly from launch_configuration.

Autoscaling Group

An Auto Scaling group contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management. An Auto Scaling group also enables you to use Amazon EC2 Auto Scaling features such as health check replacements and scaling policies. Both maintaining the number of instances in an Auto Scaling group and automatic scaling are the core functionality of the Amazon EC2 Auto Scaling service. Asg needs launch_configuration or launch template to launch instances.

resource "aws_autoscaling_group" "ecs_sg" {

    name                      = "asg"
    vpc_zone_identifier       = aws_subnet.private.*.id
    launch_configuration      = aws_launch_configuration.ecs_launch_config.name

    desired_capacity          = var.desired_capacity_asg
    min_size                  = var.min_size_asg
    max_size                  = var.max_size_asg
    health_check_grace_period = 150
    health_check_type         = "EC2"
    
}

~here it is compulsory to pass the vpc_zone and launch_Config, to know more about the attributes used here refer to official docs.

Load Balancer

Elastic Load Balancing automatically distributes your incoming traffic across multiple targets, such as EC2 instances, containers, and IP addresses, in one or more Availability Zones. It monitors the health of its registered targets, and routes traffic only to the healthy targets. Elastic Load Balancing scales your load balancer as your incoming traffic changes over time. It can automatically scale to the vast majority of workloads.

resource "aws_lb_target_group" "autoscaleTG" {
  name     = "AutoscaleTG"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.this.id
}

## application load balancer
resource "aws_lb" "ecs_lb" {
  name               = "ecslb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.ecs_alb_sg.id]
  subnets            = aws_subnet.public.*.id
  enable_deletion_protection = false

}
## listener for load balancer
resource "aws_lb_listener" "front_end" {
  load_balancer_arn = aws_lb.ecs_lb.arn
  port              = "80"
  protocol          = "HTTP"
  

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.autoscaleTG.arn
  }
}
resource "aws_lb_listener_rule" "test" {
  listener_arn = aws_lb_listener.front_end.arn
  priority     = 100

  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.autoscaleTG.arn
  }

  condition {
    path_pattern {
      values = ["/home/*"]
    }
  }
}


output "alb_dnsname" {
  value = aws_lb.ecs_lb.dns_name
}

~it will create a target group that holds the target where the load is to be distributed, and this will be attached to the ECS service

~then it creates the load balancer with essential attributes such as type, security_group_id, subnets(public)

~ also create frontend rule for load balancer where it listens to the traffic.

~ shows the dns_name as output

IAM Roles

An IAM role is an IAM identity that you can create in your account that has specific permissions. An IAM role is similar to an IAM user, in that it is an AWS identity with permission policies that determine what the identity can and cannot do in AWS. However, instead of being uniquely associated with one person, a role is intended to be assumable by anyone who needs it. Also, a role does not have standard long-term credentials such as a password or access keys associated with it. Instead, when you assume a role, it provides you with temporary security credentials for your role session.

data "aws_iam_policy_document" "ecs_task-assume-role-policy" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}


resource "aws_iam_role" "ecsTaskExecution_role" {
  name               = "ecsTaskExecution_role"
  assume_role_policy = data.aws_iam_policy_document.ecs_task-assume-role-policy.json
}
resource "aws_iam_role_policy" "ecsTaskExecution_policy" {
  name = "ecsTaskExecPolicy"
  role = aws_iam_role.ecsTaskExecution_role.id
  policy = file("policies/ecsTaskExecutionPolicy.json")
}





#ecs role for Ec2 instance

data "aws_iam_policy_document" "instance-assume-role-policy" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "ecsinstance_role" {
  name               = "ecsinstance_role"
  assume_role_policy = data.aws_iam_policy_document.instance-assume-role-policy.json
}
resource "aws_iam_role_policy" "ecspolicy" {
  name = "ecspolicy"
  role = aws_iam_role.ecsinstance_role.id
  policy = file("policies/ecsInstancePolicy.json")
}
  resource "aws_iam_instance_profile" "ecs_role" {
  name = "ecs_role"
  role = aws_iam_role.ecsinstance_role.name
}

~ it will create a role for the ec2 instance and ECS-task so that they can communicate to the ECS service.

ECS cluster

An Amazon ECS cluster is a logical grouping of tasks or services. Tasks and services are run on infrastructure that is registered to a cluster. Here we first create an empty cluster and then add the computing, task definition and service. For the computing we have to pass the cluster name in the ecs.config file in the computing units here in ec2-instances, then it will automatically connects with the cluster if it has permission to do so.

resource "aws_ecs_cluster" "ecs_cluster" {
  
  name = "${var.clusterName}"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

~an empty cluster will create with container insights

Task Definition

A task definition is required to run Docker containers in Amazon ECS. The following are some of the parameters you can specify in a task definition:

The Docker image to use with each container in your task
How much CPU and memory to use with each task or each container within a task
The launch type to use, which determines the infrastructure on which your tasks are hosted
The Docker networking mode to use for the containers in your task
The logging configuration to use for your tasks
Whether the task should continue to run if the container finishes or fails
The command the container should run when it is started
Any data volumes that should be used with the containers in the task
The IAM role that your tasks should use

resource "aws_ecs_task_definition" "service" {
  family = "vimal13"
  network_mode = "bridge"
  requires_compatibilities = ["EC2"]
  task_role_arn = aws_iam_role.ecsTaskExecution_role.arn
  execution_role_arn = aws_iam_role.ecsTaskExecution_role.arn
  container_definitions = jsonencode([
    {
      name      = "first"
      image     = "vimal13/apache-webserver-php"
      cpu       = 200
      memory    = 200
      essential = true
      portMappings = [
        {
          containerPort = 80
          hostPort      = 0
        }
      ]
    },
    
  ])
}

~ here I am passing my own customized image which helps to detect that load is successfully distributed among the containers.

We can pass more attributes their like volume, scripts, and many more, refer docs for more info.

ECS Service

An Amazon ECS service allows to run and maintain a specified number of instances of a task definition simultaneously in an Amazon ECS cluster. If any of your tasks should fail or stop for any reason, the Amazon ECS service scheduler launches another instance of your task definition to replace it in order to maintain the desired number of tasks in the service.

In addition to maintaining the desired number of tasks in your service, run our service behind the load balancer. The load balancer distributes traffic across the tasks that are associated with the service.

resource "aws_ecs_service" "worker" {
  name            = "worker"
  cluster         = aws_ecs_cluster.ecs_cluster.id
  task_definition = aws_ecs_task_definition.service.arn
  desired_count   = var.count_container
  iam_role          = var.service_arn
  load_balancer {
      target_group_arn = aws_lb_target_group.autoscaleTG.arn
      container_name = "first"
      container_port = 80
  }
}

~ task_definition arn can be found inside your iam console

~ pass the target arn here so that the containers are registered in the target group of the load balancer.

IMP** Don not pass the target group arn in the auto scalling group otherwise your traffic will not be pass to the containers.

Deploying the code

Before deploying firstly we have to install all the plugins of terraform and create a working directory where all data related to the infrastructure to be saved in the form of key pair. This working direct can be anywhere ( for ex. in S3), but here we initializing this in local. These all can be done by a single command.

Before applying any changes its good practice creates a plan which gives a rough idea of what is going to be implemented.

$terraform plan

If everything seems ok then apply all changes by the command

$terraform apply

For deletion of the infrastrtucture use command

$terraform destroy

If we want to make any changes to the configuration of the instances we can do the same by launching a bastion host in the public subnet.

Hope you like my work and got something new. Thank You!!!