Skip to main content
Service Catalog Version 2.9.1Last updated in version 2.9.0

Cost Management

View Source Release Notes

Overview

This service deploys a unified AWS cost-control stack:

  • AWS Budgets — one or more threshold-based budgets (daily, monthly, etc.).
  • AWS Cost Anomaly Detection (CAD) — ML-based anomaly monitoring with subscriber notifications.
  • Notification fan-out — a single SNS topic receives both Budgets and CAD events, with optional Slack delivery via a Lambda notifier (webhook URL read from Secrets Manager at runtime) and optional direct email subscriptions.
  • Scheduled cloud-nuke (optional) — an ECS Fargate task that runs cloud-nuke on a configurable schedule, defaulting to --dry-run. Useful for ephemeral, sandbox, or developer accounts.

Learn

Under the hood, this service composes Gruntwork modules from terraform-aws-messaging (for the SNS topic) and terraform-aws-lambda (for the Slack notifier). If you are a subscriber and don't have access to those repos, email support@gruntwork.io.

Core concepts

  • AWS Budgets: cost thresholds evaluated on a daily, monthly, quarterly, or annual cadence. Each budget publishes to an SNS topic when its threshold is crossed. See the AWS Budgets documentation for the data model and the available notification_type and time_unit values.
  • AWS Cost Anomaly Detection: an ML-based monitor that observes spend patterns and emits an event when an anomaly is detected. See the Cost Anomaly Detection documentation.
  • cloud-nuke: a Gruntwork tool that deletes resources in an AWS account, intended for cleanup of ephemeral or sandbox accounts. See the cloud-nuke README for the resource matrix and the --config file format.

Important caveats

AWS Cost Anomaly Detection is global, but pinned to us-east-1

aws_ce_anomaly_monitor and aws_ce_anomaly_subscription are global resources that must be created via the us-east-1 endpoint. This module requires an aliased provider named aws.us_east_1 configured for us-east-1. If your default provider is already us-east-1, you can still alias it. See the example for the canonical configuration.

Only one DIMENSIONAL anomaly monitor per dimension per account

AWS allows at most one DIMENSIONAL aws_ce_anomaly_monitor per dimension (e.g., SERVICE) per account. If your account already has one (created by another tool, a prior deployment, or the AWS console), this module's apply will fail with ValidationException: Limit exceeded on dimensional spend monitor creation. Set enable_anomaly_detection = false and attach an aws_ce_anomaly_subscription to the pre-existing monitor out-of-band, or destroy the existing monitor first.

cloud-nuke is destructive

When enable_scheduled_cloud_nuke = true:

  • The module defaults to cloud_nuke_dry_run = true. Dry-run mode logs what would be deleted without deleting.
  • To enable real deletions, you must set both cloud_nuke_dry_run = false and acknowledge_destructive_cloud_nuke = true. The module enforces this via a plan-time precondition.
  • The module does not ship an IAM policy granting cloud-nuke permissions to delete resources. You are expected to attach a separate policy to output.cloud_nuke_task_role_arn. A reference policy snippet is published at the cloud-nuke README. Your security team should review and trim it before attaching.

Deploy

See examples/for-learning-and-testing/mgmt/cost-management for a runnable example.

Sample Usage

main.tf

# ------------------------------------------------------------------------------------------------------
# DEPLOY GRUNTWORK'S COST-MANAGEMENT MODULE
# ------------------------------------------------------------------------------------------------------

module "cost_management" {

source = "git::git@github.com:gruntwork-io/terraform-aws-service-catalog.git//modules/mgmt/cost-management?ref=v2.9.1"

# ----------------------------------------------------------------------------------------------------
# REQUIRED VARIABLES
# ----------------------------------------------------------------------------------------------------

# Human-readable AWS account name (e.g., 'sandbox', 'prod-payments'). Included
# as a label in Slack notifications so a single channel can disambiguate
# alerts from multiple accounts.
account_name = <string>

# Name prefix applied to all resources created by this module (SNS topic,
# Budgets, CAD, Slack Lambda, cloud-nuke task).
name = <string>

# ----------------------------------------------------------------------------------------------------
# OPTIONAL VARIABLES
# ----------------------------------------------------------------------------------------------------

# Explicit acknowledgement that you intend to run cloud-nuke in destructive
# (non-dry-run) mode. To enable real deletions, this must be true AND
# var.cloud_nuke_dry_run must be false. Enforced by a plan-time precondition.
acknowledge_destructive_cloud_nuke = false

# Dimension to monitor for anomalies. One of SERVICE or LINKED_ACCOUNT.
anomaly_monitor_dimension = "SERVICE"

# Notification frequency for the anomaly subscription. AWS only allows
# IMMEDIATE for SNS subscribers; DAILY and WEEKLY require EMAIL subscribers
# and are rejected at apply time.
anomaly_subscription_frequency = "IMMEDIATE"

# Minimum total impact, in USD, above which a detected anomaly triggers a
# subscription notification.
anomaly_threshold_amount = 100

# AWS Budgets to create. Each entry produces one aws_budgets_budget that
# publishes to the module's SNS topic when the threshold is crossed. Fields:
# 'name' is appended to var.name; 'time_unit' must be DAILY, MONTHLY,
# QUARTERLY, or ANNUALLY; 'limit_amount' is the cap in USD;
# 'threshold_percent' is the percentage of limit_amount at which to alert;
# 'notification_type' must be ACTUAL or FORECASTED.
budgets = [{"limit_amount":75,"name":"daily","notification_type":"ACTUAL","threshold_percent":100,"time_unit":"DAILY"},{"limit_amount":800,"name":"monthly-actual","notification_type":"ACTUAL","threshold_percent":100,"time_unit":"MONTHLY"}]

# Whether to assign a public IP to the cloud-nuke Fargate task ENI. Set to
# true when running in public subnets without a NAT gateway (typical for AWS
# default VPCs) so the task can reach AWS API endpoints and the image
# registry.
cloud_nuke_assign_public_ip = false

# Optional inline cloud-nuke config YAML. When non-null, the module writes the
# YAML to an S3 object and points cloud-nuke at it via --config. When null, no
# config file is passed and cloud-nuke uses its defaults. See
# https://github.com/gruntwork-io/cloud-nuke#config-file for the schema.
cloud_nuke_config_yaml = null

# Whether to run cloud-nuke in --dry-run mode, which logs what cloud-nuke
# would delete without actually deleting anything. See
# var.acknowledge_destructive_cloud_nuke for how to enable real deletions.
cloud_nuke_dry_run = true

# Container image reference for cloud-nuke (e.g. an ECR repository URI or a
# public image). Gruntwork does not currently publish a cloud-nuke OCI image,
# so operators must build and host their own — see the module README for a
# reference Dockerfile. Required when var.enable_scheduled_cloud_nuke is true.
cloud_nuke_image = null

# EventBridge Scheduler schedule expression for cloud-nuke runs. Accepts a
# cron(...) or rate(...) expression. The default is 07:00 UTC daily.
cloud_nuke_schedule_expression = "cron(0 7 * * ? *)"

# Security groups to attach to the cloud-nuke Fargate task ENI. Each must
# permit outbound HTTPS so the task can reach AWS API endpoints, S3 (for
# config), and ECR/GHCR (for the image pull). When empty, ECS attaches the VPC
# default security group, whose outbound rules may not permit these calls.
cloud_nuke_security_group_ids = []

# AWS regions cloud-nuke should operate against, passed as repeated --region
# flags. An empty list defers to cloud-nuke's default behavior, which is to
# operate on all regions enabled in the account.
cloud_nuke_target_regions = []

# Fargate task CPU units allocated to the cloud-nuke task. See
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
# for valid CPU/memory combinations.
cloud_nuke_task_cpu = 512

# Memory, in MB, allocated to the cloud-nuke Fargate task. See
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
# for valid CPU/memory combinations.
cloud_nuke_task_memory = 1024

# ARN of an existing ECS cluster to run the cloud-nuke task on. When null,
# this module creates a dedicated cluster named '<var.name>-cloud-nuke'. Only
# used when var.enable_scheduled_cloud_nuke is true.
ecs_cluster_arn = null

# Email addresses to subscribe to the cost alerts SNS topic. Each address
# receives an AWS confirmation email that must be acknowledged before delivery
# begins.
email_endpoints = []

# Whether to provision AWS Cost Anomaly Detection. Requires the aws.us_east_1
# aliased provider since CAD resources are global but only callable via
# us-east-1.
enable_anomaly_detection = true

# Whether to provision AWS Budgets alerts. When true, each entry in
# var.budgets becomes one aws_budgets_budget that publishes to the module's
# SNS topic.
enable_budget_alerts = true

# Whether to provision an EventBridge-scheduled Fargate task that runs
# cloud-nuke on var.cloud_nuke_schedule_expression. Off by default because
# cloud-nuke is destructive. When true, var.vpc_id and var.subnet_ids must
# also be set.
enable_scheduled_cloud_nuke = false

# Whether to deploy the Slack notifier Lambda and subscribe it to the cost
# alerts SNS topic. When true, var.slack_webhook_url_secrets_manager_arn must
# also be set.
enable_slack_notifications = false

# Memory, in MB, allocated to the Slack notifier Lambda.
lambda_memory_size = 128

# Maximum runtime, in seconds, of the Slack notifier Lambda before
# termination.
lambda_timeout = 30

# Retention, in days, for CloudWatch logs of the Slack notifier Lambda and the
# cloud-nuke Fargate task.
log_retention_days = 30

# ARN of a Secrets Manager secret whose SecretString is the Slack incoming
# webhook URL. Required when var.enable_slack_notifications is true. The
# Lambda reads this secret at runtime so the URL never appears in plan output
# or Terraform state.
slack_webhook_url_secrets_manager_arn = null

# Private subnet IDs the cloud-nuke Fargate task launches into. Subnets must
# have outbound internet access (e.g., via a NAT gateway) so the task can
# reach AWS API endpoints. Required when var.enable_scheduled_cloud_nuke is
# true.
subnet_ids = []

# VPC the cloud-nuke Fargate task runs in. Required when
# var.enable_scheduled_cloud_nuke is true.
vpc_id = null

}


Reference

Required

account_namestringrequired

Human-readable AWS account name (e.g., 'sandbox', 'prod-payments'). Included as a label in Slack notifications so a single channel can disambiguate alerts from multiple accounts.

namestringrequired

Name prefix applied to all resources created by this module (SNS topic, Budgets, CAD, Slack Lambda, cloud-nuke task).

Optional

Explicit acknowledgement that you intend to run cloud-nuke in destructive (non-dry-run) mode. To enable real deletions, this must be true AND cloud_nuke_dry_run must be false. Enforced by a plan-time precondition.

false

Dimension to monitor for anomalies. One of SERVICE or LINKED_ACCOUNT.

"SERVICE"

Notification frequency for the anomaly subscription. AWS only allows IMMEDIATE for SNS subscribers; DAILY and WEEKLY require EMAIL subscribers and are rejected at apply time.

"IMMEDIATE"

Minimum total impact, in USD, above which a detected anomaly triggers a subscription notification.

100
budgetslist(object(…))optional

AWS Budgets to create. Each entry produces one aws_budgets_budget that publishes to the module's SNS topic when the threshold is crossed. Fields: 'name' is appended to name; 'time_unit' must be DAILY, MONTHLY, QUARTERLY, or ANNUALLY; 'limit_amount' is the cap in USD; 'threshold_percent' is the percentage of limit_amount at which to alert; 'notification_type' must be ACTUAL or FORECASTED.

list(object({
name = string
time_unit = string # DAILY | MONTHLY | QUARTERLY | ANNUALLY
limit_amount = number # USD
threshold_percent = number
notification_type = string # ACTUAL | FORECASTED
}))
[
{
limit_amount = 75,
name = "daily",
notification_type = "ACTUAL",
threshold_percent = 100,
time_unit = "DAILY"
},
{
limit_amount = 800,
name = "monthly-actual",
notification_type = "ACTUAL",
threshold_percent = 100,
time_unit = "MONTHLY"
}
]

Whether to assign a public IP to the cloud-nuke Fargate task ENI. Set to true when running in public subnets without a NAT gateway (typical for AWS default VPCs) so the task can reach AWS API endpoints and the image registry.

false

Optional inline cloud-nuke config YAML. When non-null, the module writes the YAML to an S3 object and points cloud-nuke at it via --config. When null, no config file is passed and cloud-nuke uses its defaults. See https://github.com/gruntwork-io/cloud-nuke#config-file for the schema.

null
cloud_nuke_dry_runbooloptional

Whether to run cloud-nuke in --dry-run mode, which logs what cloud-nuke would delete without actually deleting anything. See acknowledge_destructive_cloud_nuke for how to enable real deletions.

true
cloud_nuke_imagestringoptional

Container image reference for cloud-nuke (e.g. an ECR repository URI or a public image). Gruntwork does not currently publish a cloud-nuke OCI image, so operators must build and host their own — see the module README for a reference Dockerfile. Required when enable_scheduled_cloud_nuke is true.

null

EventBridge Scheduler schedule expression for cloud-nuke runs. Accepts a cron(...) or rate(...) expression. The default is 07:00 UTC daily.

"cron(0 7 * * ? *)"
cloud_nuke_security_group_idslist(string)optional

Security groups to attach to the cloud-nuke Fargate task ENI. Each must permit outbound HTTPS so the task can reach AWS API endpoints, S3 (for config), and ECR/GHCR (for the image pull). When empty, ECS attaches the VPC default security group, whose outbound rules may not permit these calls.

[]
cloud_nuke_target_regionslist(string)optional

AWS regions cloud-nuke should operate against, passed as repeated --region flags. An empty list defers to cloud-nuke's default behavior, which is to operate on all regions enabled in the account.

[]
cloud_nuke_task_cpunumberoptional

Fargate task CPU units allocated to the cloud-nuke task. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html for valid CPU/memory combinations.

512

Memory, in MB, allocated to the cloud-nuke Fargate task. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html for valid CPU/memory combinations.

1024
ecs_cluster_arnstringoptional

ARN of an existing ECS cluster to run the cloud-nuke task on. When null, this module creates a dedicated cluster named '<name>-cloud-nuke'. Only used when enable_scheduled_cloud_nuke is true.

null
email_endpointslist(string)optional

Email addresses to subscribe to the cost alerts SNS topic. Each address receives an AWS confirmation email that must be acknowledged before delivery begins.

[]

Whether to provision AWS Cost Anomaly Detection. Requires the aws.us_east_1 aliased provider since CAD resources are global but only callable via us-east-1.

true

Whether to provision AWS Budgets alerts. When true, each entry in budgets becomes one aws_budgets_budget that publishes to the module's SNS topic.

true

Whether to provision an EventBridge-scheduled Fargate task that runs cloud-nuke on cloud_nuke_schedule_expression. Off by default because cloud-nuke is destructive. When true, vpc_id and subnet_ids must also be set.

false

Whether to deploy the Slack notifier Lambda and subscribe it to the cost alerts SNS topic. When true, slack_webhook_url_secrets_manager_arn must also be set.

false
lambda_memory_sizenumberoptional

Memory, in MB, allocated to the Slack notifier Lambda.

128
lambda_timeoutnumberoptional

Maximum runtime, in seconds, of the Slack notifier Lambda before termination.

30
log_retention_daysnumberoptional

Retention, in days, for CloudWatch logs of the Slack notifier Lambda and the cloud-nuke Fargate task.

30

ARN of a Secrets Manager secret whose SecretString is the Slack incoming webhook URL. Required when enable_slack_notifications is true. The Lambda reads this secret at runtime so the URL never appears in plan output or Terraform state.

null
subnet_idslist(string)optional

Private subnet IDs the cloud-nuke Fargate task launches into. Subnets must have outbound internet access (e.g., via a NAT gateway) so the task can reach AWS API endpoints. Required when enable_scheduled_cloud_nuke is true.

[]
vpc_idstringoptional

VPC the cloud-nuke Fargate task runs in. Required when enable_scheduled_cloud_nuke is true.

null