Skip to main content
AWS Messaging 0.12.5Last updated in version 0.12.4

Kinesis Data Stream Module

View SourceRelease Notes

This module creates a Kinesis Data Stream.

About Kinesis Data Stream

A Kinesis data stream is a set of shards. Each shard has a sequence of data records. Each data record has a sequence number that is assigned by Kinesis Data Streams.

  • data record: A data record is the unit of data stored in a Kinesis data stream. Data records are composed of a sequence number, a partition key, and a data blob, which is an immutable sequence of bytes.
  • shard: A shard is a uniquely identified sequence of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity.
  • sequence number: Each data record has a sequence number that is unique per partition-key within its shard. Kinesis Data Streams assigns the sequence number after you write to the stream with client.putRecords or client.putRecord. Sequence numbers for the same partition key generally increase over time. The longer the time period between write requests, the larger the sequence numbers become.

Sharding / Partitioning in Kinesis Data Stream

Kinesis Data Stream achieves scalability by using shards. The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards.

How to Set Shard Size

You can configure the initial number of shards in two ways:

  • direct specification: specify number_of_shards directly
  • indirect specification: specify the average_data_size_in_kb, records_per_second and number_of_consumers variables and let the module calculate the initial number of shards.

Note: the module calculates the initial number of shards by:

  1. Calculate the incoming write bandwidth in KB (incoming_write_bandwidth_in_KB), which is equal to the average_data_size_in_KB multiplied by the number_of_records_per_second.
  2. Calculate the outgoing read bandwidth in KB (outgoing_read_bandwidth_in_KB), which is equal to the incoming_write_bandwidth_in_KB multiplied by the number_of_consumers.
  3. You can then calculate the initial number of shards (number_of_shards) your data stream needs using the following formula: number_of_shards = max (incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB/2000)

Refer to the suggestion of calculating the initial number of shards FAQ for more information.

How does Data Partition Work

A partition key is used to group data by shard within a stream. Kinesis Data Streams segregates the data records belonging to a stream into multiple shards. It uses the partition key that is associated with each data record to determine which shard a given data record belongs to. When an application puts data into a stream, it must specify a partition key.

With a single shard, all data goes into the same shard. There's no other way to use a custom partitioning logic.

How to Re-Shard a Stream

Re-configuring the shard size will result destroying the old Kinesis data stream and re-creating it with a new one. In order to prevent this, consider using the UpdateShardCount API. Updating the shard count is an asynchronous operation. To update the shard count, Kinesis Data Streams performs splits or merges on individual shards. This can cause short-lived shards to be created, in addition to the final shards. These short-lived shards count towards your total shard limit for your account in the Region. You can find more information in the following pages:

Limitation

Here are some limitation of the Kinesis Data stream you might be interested in :

  • Data Payload Size: The maximum size of the data payload of a record before base64-encoding is up to 1 MB.
  • Retention Period: The maximum value of a stream's retention period is 8760 hours (365 days).
  • Shard Throughput: Each shard can support up to 1 MB/sec or 1,000 records/sec write throughput or up to 2 MB/sec or 2,000 records/sec read throughput.

You can find the latest and full list of limitation/quotas in this page: Quotas and Limits.

Encryption

Amazon Kinesis Data Streams can automatically encrypt sensitive data as it enters into a stream. Kinesis Data Streams uses AWS KMS master keys for encryption. With server-side encryption, your Kinesis stream producers and consumers don't need to manage master keys or cryptographic operations. Your data is automatically encrypted as it enters and leaves the Kinesis Data Streams service, so your data at rest is encrypted. For more information, see Data Protection in Amazon Kinesis Data Streams.

How to Enable Encryption

You can enable encryption in two ways:

  • default encryption: set encryption_type = "KMS". This will use the default AWS service key for Kinesis, aws/kinesis.
  • custom key encryption: If you need to use a Customer Managed Key (CMK), see the master key module as well as documentation on user-generated KMS master keys for further information on how to create them. You can specify one using kms_key_id = "alias/<my_cmk_alias>"

How to Change KMS Key

You can change the KMS key by reconfiguring the encryption with the kms_key_id and encryption_type variables.

Please note that changing the KMS key for a Kinesis Data Stream does not retroactively re-encrypt previously encrypted data in the stream with the new KMS key. Any data that was previously encrypted with the old KMS key will remain encrypted with that key. However, any new data added to the stream after the KMS key change will be encrypted with the new KMS key.

If you need to re-encrypt the previously encrypted data in the stream with the new KMS key, you will need to manually copy the data to a new stream that is configured to use the new KMS key for encryption. Alternatively, you can use AWS Lambda or other AWS services to read the data from the original stream, decrypt it using the old KMS key, and then re-encrypt it with the new KMS key before writing it to a new stream or another data store.

Replication

Amazon Kinesis Data Stream does not support replication out of the box. One way to implement replication is to use Lambda. You can find more information from this AWS article: Build highly available streams with Amazon Kinesis Data Streams

There is also a sample prototype from AWS that demonstrates continuous data capture (CDC) to replicate data across regions: https://github.com/aws-samples/aws-kinesis-data-streams-replicator

Sample Usage

main.tf

# ------------------------------------------------------------------------------------------------------
# DEPLOY GRUNTWORK'S KINESIS MODULE
# ------------------------------------------------------------------------------------------------------

module "kinesis" {

source = "git::git@github.com:gruntwork-io/terraform-aws-messaging.git//modules/kinesis?ref=v0.12.5"

# ----------------------------------------------------------------------------------------------------
# REQUIRED VARIABLES
# ----------------------------------------------------------------------------------------------------

# The name of the Kinesis stream.
name = <string>

# ----------------------------------------------------------------------------------------------------
# OPTIONAL VARIABLES
# ----------------------------------------------------------------------------------------------------

# The average size of the data record written to the stream in kilobytes (KB),
# rounded up to the nearest 1 KB
average_data_size_in_kb = 0

# The type of encryption to use (can be KMS or NONE). Default to use KMS key
# for encryption at rest.
encryption_type = "KMS"

# A boolean that indicates all registered consumers should be deregistered
# from the stream so that the stream can be destroyed without error.
enforce_consumer_deletion = false

# ID of the key to use for KMS
kms_key_id = "alias/aws/kinesis"

# The number of Amazon Kinesis Streams applications that consume data
# concurrently and independently from the stream, that is, the consumers
number_of_consumers = 0

# A shard is a group of data records in a stream. When you create a stream,
# you specify the number of shards for the stream.
number_of_shards = null

# The number of data records written to and read from the stream per second
records_per_second = 0

# Length of time data records are accessible after they are added to the
# stream. The maximum value of a stream's retention period is 168 hours.
# Minimum value is 24.
retention_period = 24

# The additional shard-level CloudWatch metrics to enable
shard_level_metrics = []

# Specifies the capacity mode of the stream. Must be either PROVISIONED or
# ON_DEMAND. When you are using PROVISIONED mode, you must set either the
# shard_count directly or set the average_data_size_in_kb, records_per_second,
# and number_of_consumers
stream_mode = null

# A map of key value pairs to apply as tags to the Kinesis stream.
tags = {}

}


Reference

Required

namestringrequired

The name of the Kinesis stream.

Optional

The average size of the data record written to the stream in kilobytes (KB), rounded up to the nearest 1 KB

0
encryption_typestringoptional

The type of encryption to use (can be KMS or NONE). Default to use KMS key for encryption at rest.

"KMS"

A boolean that indicates all registered consumers should be deregistered from the stream so that the stream can be destroyed without error.

false
kms_key_idstringoptional

ID of the key to use for KMS

"alias/aws/kinesis"
number_of_consumersnumberoptional

The number of Amazon Kinesis Streams applications that consume data concurrently and independently from the stream, that is, the consumers

0
number_of_shardsnumberoptional

A shard is a group of data records in a stream. When you create a stream, you specify the number of shards for the stream.

null
records_per_secondnumberoptional

The number of data records written to and read from the stream per second

0
retention_periodnumberoptional

Length of time data records are accessible after they are added to the stream. The maximum value of a stream's retention period is 168 hours. Minimum value is 24.

24
shard_level_metricslist(string)optional

The additional shard-level CloudWatch metrics to enable

[]
Details

Possible Values:

shard_level_metrics = [
"IncomingBytes",
"IncomingRecords",
"IteratorAgeMilliseconds",
"OutgoingBytes",
"OutgoingRecords",
"ReadProvisionedThroughputExceeded",
"WriteProvisionedThroughputExceeded"
]

stream_modestringoptional

Specifies the capacity mode of the stream. Must be either PROVISIONED or ON_DEMAND. When you are using PROVISIONED mode, you must set either the shard_count directly or set the average_data_size_in_kb, records_per_second, and number_of_consumers

null
tagsmap(string)optional

A map of key value pairs to apply as tags to the Kinesis stream.

{}