How do I upgrade my Gruntwork modules and Terraform version?
Hi, we are currently planning our TF version update from `0.12.17` to `0.12.31` and later to `0.13`/`0.14`/... First step is `0.12.17` -> `0.12.31`, to get new AWS provider releases. All our modules are in a monorepo, like Gruntwork suggests. Some of them are completely self-written, some wrap Gruntwork modules, which just have a `required_version = ">= 0.12"` constraint. In our `modules-repo`, we enforce the usage of `0.12.17` in every module. We currently have about 500 HCL files in our `live-repo`, which reference about 70 different Git tags from our `modules-repo`. We don’t use a CI to apply infrastructure code. To deal with the 70 different Git tags, we are currently discussing two options: * Create patch-versions which support `0.12.31` for the 70 different referenced Git tags and update the references in the `live-repo`. Simply changing those 70 different module references to the latest would not work, because we have breaking changes in our modules. * Update all references to the latest version, which supports `0.12.31` and adopt all breaking changes in the `live-repo`. This seems like a big undertaking. We are debating if it is necessary to update all HCLs in a “big bang”. * Manually applying 500 HCLs isn’t fun, so is it a problem to have some infrastructure resources on `0.12.17` and some on newer versions? * We discovered that the order in which we update the resources matters, because of Terragrunt dependencies. Example: HCLs A, B and C have dependencies to X. In this case, X can be updated to the TF version of min(A,B,C). If we would update X, we are unable to apply the dependents A,B,C with the old TF version because TF cannot read from a remote state of a newer TF version. `terragrunt graph-dependencies` helped us to discover dependents. So, we are currently facing the following questions: * How should we deal with the Git versions of our modules? * Do we need to update all of our infrastructure in a big bang? What risks do you see in not doing that? * Is it correct, that in order to be able to migrate to `0.13` you have to apply all TF code with a version `>= 0.12.26` first? My experiments showed, that TF lets me apply TF code with remote state version `< 0.12.26` with TF `0.13.7`. * We are currently pinning the TF version in our modules. Another idea is to move this to the `live-repo`, so we could use different TF versions with the same module version, i.e. both `0.12.17` and `0.12.31`. It is possible to pin versions in the `live-repo` via the `.terraform-version` file (also for every HCL separately), but this seems to only work when `tfenv` is installed. What do you think about that?
In general, we recommend biting the bullet and updating everything at once. While there are features in terragrunt to support an incremental update transition, we have seen that this tends to incentivize teams to stay on the older versions longer, leading to near permanently staying on the older versions. This can cause operational overhead as it can lead to confusion as to which modules/components have been updated and why. With that said, it’s understandable that that is a bigger undertaking, so in this scenario, you might want to take the update in phases. The tricky bit here is that many of the strategies for incrementally updating your infrastructure will requires changes to your terraform code, so depending on your risk tolerance, it may not be feasible. Given that, here is a strategy you can take to incrementally update your modules. 1. Plan your migration path by mapping out your leaf modules - modules that have no dependencies tied to it. You want to start at the leaf because that is where you can control how the module reads in the upstream dependencies. As you tested, newer module state won’t be readable by older tf versions until `0.12.26`, so updating an upstream dependency can break downstream modules. 1. Configure a root `.terraform-version` to map to the old version in your live repo. As you mentioned, this only works with terraform version managers, but this is one of the only ways you will be able to have mixed terraform projects. Note that there are several tools that honor this, so you have some flexibility on which version manager to use: `tfenv`, `asdf-vm`, `tfswitch`. 1. Alternatively, you can manually do the version management with the `terraform_binary` setting https://terragrunt.gruntwork.io/docs/reference/config-blocks-and-attributes/#terraform_binary . This is harder to consistently manage across all operating environments, but can be done without the use of an additional tool. 1. Once the live repo is ready to handle mixed terraform versions, you can start to update the modules. For each leaf module, update the following: 1. Update the required `terraform` version in the module. 1. Update the terraform code to ensure it works with the target version you are updating to. AFAIK, there should be very few changes you need going from `0.12` to `1.0` 1. Refactor the module dependencies so that instead of reading them from remote state using `terraform_remote_state`, take them as variable inputs. 1, In the `terragrunt.hcl`, handle the dependencies by using dependency blocks so that terragrunt pulls the updates. This is the magic that bypasses the state version compatibility issues. `dependency` blocks are a feature where terragrunt calls terraform output to read the data out of the state and pulls it into the context. You can then pass it through to the module as inputs using `dependency.NAME.outputs.OUTPUT` e.g., `dependency.vpc.outputs.vpc_id` (assuming the dependency points to the vpc module). See https://terragrunt.gruntwork.io/docs/features/execute-terraform-commands-on-multiple-modules-at-once/#passing-outputs-between-modules for more details. 1. Add a `.terraform-version` file alongside the `terragrunt.hcl` for the leaf module to point to the newer terraform version. 1. Once this is in place, terragrunt should do the following (i) use terraform version `0.12.17` to read out the outputs from module dependencies (ii) pass the `dependency` outputs as inputs to the module (iii) use the newer version when applying the current module; effectively allowing you to have newer terraform version read the state of the older one. 1. It might be the case already that newer TF versions can read older TF versions' state. There might have been restrictions before (e.g., trying to read 0.12.17 state from 1.0), but maybe these were addressed in recent patches. 1. At this point, you will have updated one or a few of the modules while still keeping most of the other modules untouched. You can now recursively work your way up the dependency chain until all the modules have been updated. Caveat on following dependency ordering: If you are already using terragrunt `dependency` blocks, you actually don't need to start at the leaf modules. You can start anywhere. This is because the feature allows you to read the state of modules at newer versions. This means you don't have to look at the dependency chain and don't have to follow that chain, simplifying the process. But if you aren't using terragrunt `dependency` blocks, you end up having to update the leaf nodes anyway to adapt to `dependency` blocks, and at that point, you might as well update the terraform version so the advantages are slimmer. Hope that makes sense!