Control Tower Multi-Account Factory Async
This OpenTofu/Terraform module provisions multiple AWS accounts using AWS Control Tower Account Factory. Under the hood, it leverages the control-tower-account-factory-async module for account creation. It also includes a separate mechanism to detect and remediate drifted or outdated AWS Service Catalog products asynchronously, outside of OpenTofu/Terraform, using an EventBridge rule, SQS, Lambda, and AWS Step Functions.
Background and Justification
The standard synchronous approach to provisioning or updating AWS accounts via Control Tower can lead to lengthy OpenTofu/Terraform runs, especially when Control Tower APIs are slow or when updating a large number of accounts. More importantly, certain types of "drift" caused by Control Tower changes are difficult to reconcile using OpenTofu/Terraform alone.
This module takes an asynchronous approach by deploying infrastructure (EventBridge, SQS, Lambda, and AWS Step Functions) that monitors for certain API calls. When relevant API calls are made (UpdateProvisioningArtifact
and UpgradeProduct
), the Lambda is triggered to complete the update process independently of OpenTofu/Terraform.
This leads to:
- Faster OpenTofu/Terraform applies
- More scalable update workflows
- Improved handling of state drift
Why remediate provisioned_product_id drift?
In AWS Control Tower, certain changes made via the AWS Console or APIs result in drift from the OpenTofu/Terraform state. For example:
- Moving an account to a new Organizational Unit (OU)
- Updating the Account Factory product version
- Modifying Service Catalog configurations
These actions change the provisioned_product_id
, which causes OpenTofu/Terraform to report drift. Left unresolved, this will make your infrastructure state inconsistent. Since the AWS Provider drives all changes to your Service Catalog provisioned products via updates to the provisioned_product_id
, the value needs to be up to date to continue making changes to it.
By queuing and applying these updates asynchronously:
- The drift is remediated safely and automatically
- The Control Tower changes are reflected in your environment
- You avoid the overhead and risk of long-running OpenTofu/Terraform changes
How It Works
- EventBridge Rule: Listens via CloudTrail for UpdateProvisioningArtifact or UpgradeProduct API calls.
- Ingest Lambda: Triggered by the EventBridge rule, it finds all provisioned products that need an update and queues them in an SQS FIFO queue. This ensures order and prevents race conditions.
- SQS Queue: Serves as a reliable, asynchronous work queue. It is configured as a FIFO queue with content-based deduplication and a Dead-Letter Queue (DLQ).
- Worker Lambda: Triggered by messages from the SQS queue, it applies the necessary provisioned_product_id updates by calling UpdateProvisionedProduct. It then initiates an AWS Step Functions state machine to track the update.
- AWS Step Functions state machine: Periodically checks the status of the Service Catalog update record to verify that it has succeeded or failed, ensuring the update is fully completed.
Controlling Concurrency
AWS Service Catalog currently enforces a hard limit of 5 account-related operations concurrently that includes provisioning, updating, and enrolling. Exceeding this limit may result in throttling errors or failed updates.
To respect this limitation and offer flexibility, this module provides a configurable variable lambda_worker_max_concurrent_operations
that governs how many updates will be performed in parallel. While the upper limit is 5 (per AWS constraints), setting it lower may be preferred in environments where other Service Catalog actions must occur concurrently (such as provisioning new accounts). This ensures that background remediation work does not block critical operations or trigger rate limiting.
Sample Usage
- Terraform
- Terragrunt
# ------------------------------------------------------------------------------------------------------
# DEPLOY GRUNTWORK'S CONTROL-TOWER-MULTI-ACCOUNT-FACTORY-ASYNC MODULE
# ------------------------------------------------------------------------------------------------------
module "control_tower_multi_account_factory_async" {
source = "git::git@github.com:gruntwork-io/terraform-aws-control-tower.git//modules/landingzone/control-tower-multi-account-factory-async?ref=v1.0.0"
# ----------------------------------------------------------------------------------------------------
# REQUIRED VARIABLES
# ----------------------------------------------------------------------------------------------------
# The absolute path to the folder to look for new account request files. Each
# file should be named account-<NAME>.yml, where NAME is the name of an
# account to create. Within the YAML file, you must define the following
# fields: account_email (Account email, must be globally unique across all AWS
# Accounts), sso_user_first_name (The first name of the user who will be
# granted admin access to this new account through AWS SSO),
# sso_user_last_name (The last name of the user who will be granted admin
# access to this new account through AWS SSO), sso_user_email (The email
# address of the user who will be granted admin access to this new account
# through AWS SSO), organizational_unit_name (The name of the organizational
# unit or OU in which this account should be created—must be one of the OUs
# enrolled in Control Tower).
account_requests_folder = <string>
# ----------------------------------------------------------------------------------------------------
# OPTIONAL VARIABLES
# ----------------------------------------------------------------------------------------------------
# If specified, this is assumed to be the absolute file path of a YAML file
# where the details of the new accounts created by this module will be written
# (if the file already exists, the module will merge its data into the file).
# The expected format of this YAML file is that the keys are the account names
# and the values are objects with the following keys: id (the account ID),
# email (the root user email address for the account).
accounts_yaml_path = null
# The amount of time allowed for the create operation to take before being
# considered to have failed.
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
create_operation_timeout = "60m"
# The amount of time allowed for the delete operation to take before being
# considered to have failed.
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
delete_operation_timeout = "60m"
# If set to true, this module will look for the specified organizational unit
# (OU) recursively under the root of the organization. If set to false, it
# will only look for the OU directly under the root. This is useful if you
# have nested OUs and want to create accounts in a child OU.
discover_ous_recursively = false
# KMS key for encrypting ingest lambda log group
lambda_ingest_kms_key_id = null
# Number of days to retain logs for ingest lambda functions
lambda_ingest_log_retention_in_days = 30
# Sets the memory_size in MB for the ingest lambda function used for async
# provisioning_artifact_id updates.
lambda_ingest_memory_size = 256
# Sets the timeout in seconds for the ingest lambda function used for async
# provisioning_artifact_id updates.
lambda_ingest_timeout = 900
# KMS key for encrypting worker lambda log group
lambda_worker_kms_key_id = null
# Number of days to retain logs for worker lambda functions
lambda_worker_log_retention_in_days = 30
# Service Catalog supports a maximum of 5 account updates currently. This
# variable controls the maximum concurrent operations that the worker lambda
# can initiate. It should not exceed 5 due to AWS Service Catalog limits, but
# some users may want to set it lower than 5 to provide enough overhead for
# other actions such as new account creation. Default value is 4.
lambda_worker_max_concurrent_operations = 4
# Sets the memory_size in MB for the worker lambda function used for async
# provisioning_artifact_id updates.
lambda_worker_memory_size = 256
# Sets the timeout in seconds for the worker lambda function used for async
# provisioning_artifact_id updates.
lambda_worker_timeout = 900
# The name of your AWS Control Tower Account Factory Portfolio
portfolio_name = "AWS Control Tower Account Factory Portfolio"
# The amount of time allowed for the read operation to take before being
# considered to have failed.
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
read_operation_timeout = "20m"
# The number of seconds for the Step Function 'Wait' state.
sfn_wait_time_seconds = 30
# Sets the number of times a consumer (worker lambda) can receive a message
# from SQS before it is moved to a dead-letter queue
sqs_max_receive_count = 5
# The amount of time allowed for the update operation to take before being
# considered to have failed.
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
update_operation_timeout = "60m"
}
# ------------------------------------------------------------------------------------------------------
# DEPLOY GRUNTWORK'S CONTROL-TOWER-MULTI-ACCOUNT-FACTORY-ASYNC MODULE
# ------------------------------------------------------------------------------------------------------
terraform {
source = "git::git@github.com:gruntwork-io/terraform-aws-control-tower.git//modules/landingzone/control-tower-multi-account-factory-async?ref=v1.0.0"
}
inputs = {
# ----------------------------------------------------------------------------------------------------
# REQUIRED VARIABLES
# ----------------------------------------------------------------------------------------------------
# The absolute path to the folder to look for new account request files. Each
# file should be named account-<NAME>.yml, where NAME is the name of an
# account to create. Within the YAML file, you must define the following
# fields: account_email (Account email, must be globally unique across all AWS
# Accounts), sso_user_first_name (The first name of the user who will be
# granted admin access to this new account through AWS SSO),
# sso_user_last_name (The last name of the user who will be granted admin
# access to this new account through AWS SSO), sso_user_email (The email
# address of the user who will be granted admin access to this new account
# through AWS SSO), organizational_unit_name (The name of the organizational
# unit or OU in which this account should be created—must be one of the OUs
# enrolled in Control Tower).
account_requests_folder = <string>
# ----------------------------------------------------------------------------------------------------
# OPTIONAL VARIABLES
# ----------------------------------------------------------------------------------------------------
# If specified, this is assumed to be the absolute file path of a YAML file
# where the details of the new accounts created by this module will be written
# (if the file already exists, the module will merge its data into the file).
# The expected format of this YAML file is that the keys are the account names
# and the values are objects with the following keys: id (the account ID),
# email (the root user email address for the account).
accounts_yaml_path = null
# The amount of time allowed for the create operation to take before being
# considered to have failed.
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
create_operation_timeout = "60m"
# The amount of time allowed for the delete operation to take before being
# considered to have failed.
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
delete_operation_timeout = "60m"
# If set to true, this module will look for the specified organizational unit
# (OU) recursively under the root of the organization. If set to false, it
# will only look for the OU directly under the root. This is useful if you
# have nested OUs and want to create accounts in a child OU.
discover_ous_recursively = false
# KMS key for encrypting ingest lambda log group
lambda_ingest_kms_key_id = null
# Number of days to retain logs for ingest lambda functions
lambda_ingest_log_retention_in_days = 30
# Sets the memory_size in MB for the ingest lambda function used for async
# provisioning_artifact_id updates.
lambda_ingest_memory_size = 256
# Sets the timeout in seconds for the ingest lambda function used for async
# provisioning_artifact_id updates.
lambda_ingest_timeout = 900
# KMS key for encrypting worker lambda log group
lambda_worker_kms_key_id = null
# Number of days to retain logs for worker lambda functions
lambda_worker_log_retention_in_days = 30
# Service Catalog supports a maximum of 5 account updates currently. This
# variable controls the maximum concurrent operations that the worker lambda
# can initiate. It should not exceed 5 due to AWS Service Catalog limits, but
# some users may want to set it lower than 5 to provide enough overhead for
# other actions such as new account creation. Default value is 4.
lambda_worker_max_concurrent_operations = 4
# Sets the memory_size in MB for the worker lambda function used for async
# provisioning_artifact_id updates.
lambda_worker_memory_size = 256
# Sets the timeout in seconds for the worker lambda function used for async
# provisioning_artifact_id updates.
lambda_worker_timeout = 900
# The name of your AWS Control Tower Account Factory Portfolio
portfolio_name = "AWS Control Tower Account Factory Portfolio"
# The amount of time allowed for the read operation to take before being
# considered to have failed.
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
read_operation_timeout = "20m"
# The number of seconds for the Step Function 'Wait' state.
sfn_wait_time_seconds = 30
# Sets the number of times a consumer (worker lambda) can receive a message
# from SQS before it is moved to a dead-letter queue
sqs_max_receive_count = 5
# The amount of time allowed for the update operation to take before being
# considered to have failed.
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
update_operation_timeout = "60m"
}
Reference
- Inputs
- Outputs
Required
account_requests_folder
stringThe absolute path to the folder to look for new account request files. Each file should be named account-<NAME>.yml, where NAME is the name of an account to create. Within the YAML file, you must define the following fields: account_email (Account email, must be globally unique across all AWS Accounts), sso_user_first_name (The first name of the user who will be granted admin access to this new account through AWS SSO), sso_user_last_name (The last name of the user who will be granted admin access to this new account through AWS SSO), sso_user_email (The email address of the user who will be granted admin access to this new account through AWS SSO), organizational_unit_name (The name of the organizational unit or OU in which this account should be created—must be one of the OUs enrolled in Control Tower).
Optional
accounts_yaml_path
stringIf specified, this is assumed to be the absolute file path of a YAML file where the details of the new accounts created by this module will be written (if the file already exists, the module will merge its data into the file). The expected format of this YAML file is that the keys are the account names and the values are objects with the following keys: id (the account ID), email (the root user email address for the account).
null
create_operation_timeout
stringThe amount of time allowed for the create operation to take before being considered to have failed. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
"60m"
delete_operation_timeout
stringThe amount of time allowed for the delete operation to take before being considered to have failed. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
"60m"
If set to true, this module will look for the specified organizational unit (OU) recursively under the root of the organization. If set to false, it will only look for the OU directly under the root. This is useful if you have nested OUs and want to create accounts in a child OU.
false
lambda_ingest_kms_key_id
stringKMS key for encrypting ingest lambda log group
null
Number of days to retain logs for ingest lambda functions
30
Sets the memory_size in MB for the ingest lambda function used for async provisioning_artifact_id updates.
256
lambda_ingest_timeout
numberSets the timeout in seconds for the ingest lambda function used for async provisioning_artifact_id updates.
900
lambda_worker_kms_key_id
stringKMS key for encrypting worker lambda log group
null
Number of days to retain logs for worker lambda functions
30
Service Catalog supports a maximum of 5 account updates currently. This variable controls the maximum concurrent operations that the worker lambda can initiate. It should not exceed 5 due to AWS Service Catalog limits, but some users may want to set it lower than 5 to provide enough overhead for other actions such as new account creation. Default value is 4.
4
Sets the memory_size in MB for the worker lambda function used for async provisioning_artifact_id updates.
256
lambda_worker_timeout
numberSets the timeout in seconds for the worker lambda function used for async provisioning_artifact_id updates.
900
portfolio_name
stringThe name of your AWS Control Tower Account Factory Portfolio
"AWS Control Tower Account Factory Portfolio"
read_operation_timeout
stringThe amount of time allowed for the read operation to take before being considered to have failed. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
"20m"
sfn_wait_time_seconds
numberThe number of seconds for the Step Function 'Wait' state.
30
sqs_max_receive_count
numberSets the number of times a consumer (worker lambda) can receive a message from SQS before it is moved to a dead-letter queue
5
update_operation_timeout
stringThe amount of time allowed for the update operation to take before being considered to have failed. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/servicecatalog_provisioned_product#timeouts
"60m"
The data from all the AWS accounts created.