Knowledge Base

Why does the ECS Deploy Runner have separate IAM Roles per task/container?

Answer

I noticed that the ECS Deploy Runner provisions a different IAM Role for each container (E.g., `terraform-applier`, `terraform-planner`, `ami-builder` each gets a different IAM Role. Why is that? Why not have a single IAM Role that has all the permissions necessary for every task? --- <ins datetime="2022-09-21T16:18:56Z"> <p><a href="https://support.gruntwork.io/hc/requests/109280">Tracked in ticket #109280</a></p> </ins>

As a rule of thumb, we follow the [Principle of Least Privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege) for our IAM Policies. This means that we want to only grant the necessary permissions to the tasks that it needs and nothing more. This leads to a robust and strong security posture as you never know which permission would be used in an exploit. With that said, there is a practical reason for separating out the IAM Roles for the containers in the ECS Deploy Runner suite that is beyond just least privileges. This arises from the threat model of CI/CD. Here is an example that walks through why the permissions need to be separated: Imagine an org that has set up CI/CD for terraform in their infra-live repo like the Ref Arch. In this pipeline, we invoke `terraform plan` on every PR/branch. This means that the `Terraform Planner` container permissions are available to any arbitrary unreviewed code (e.g., one could create a new branch and access that container to run `terraform plan` against their code without code review). Now suppose that we unified the necessary access permissions for running `plan` with `apply` so that we have a single `Terraform Runner` IAM policy, which includes both destroy and apply permissions. In this scenario, a single compromised account with right access can exploit this permission to do real damage. This is because `terraform` has the [external data source](https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source) escape hatch which lets you run arbitrary code at plan stage. This means that a single user with just write access can do all sorts of damage to the prod infra, like destroying the RDS DB and all backups during `plan`, if the task has the IAM permissions to do so. If the plan action only has read access, then while an attacker can do other kinds of damage (like extract secrets), at the very least they won't be able to deploy or destroy new infrastructure. This is the primary reason why we try to separate out the permissions across the different containers, as the level of access necessary to run each container task is different due to the way they are used in various CI/CD pipelines.