Terraform AWS ARC MSK Module Usage Guide¶
Introduction¶
Purpose of the Document¶
This document provides guidelines and instructions for users looking to implement the Terraform AWS ARC MSK (Managed Streaming for Apache Kafka) module.
Module Overview¶
The Terraform AWS ARC MSK module provides a secure and modular foundation for deploying Amazon MSK (Managed Streaming for Apache Kafka) clusters on AWS. It supports both standard and serverless MSK clusters with comprehensive configuration options for encryption, authentication, monitoring, and logging. In addition, this module supports configuring MSK Connect connectors to integrate data sources like Amazon Aurora PostgreSQL and destinations like Amazon S3, enabling real-time data streaming pipelines using custom Kafka Connect plugins.
Prerequisites¶
Before using this module, ensure you have the following:
- AWS credentials configured.
- Terraform installed (version > 1.4, < 2.0.0).
- A working knowledge of AWS VPC, Apache Kafka, MSK, and Terraform concepts.
Getting Started¶
Module Source¶
To use the module in your Terraform configuration, include the following source block:
MSK Connect Data Sink: Aurora PostgreSQL to Amazon S3¶
This Terraform example provisions MSK Connect components that enable data ingestion from an Amazon Aurora PostgreSQL database into Amazon S3, using Kafka Connect and Confluent plugins.¶
Prerequisites:
Before running the Terraform example in example/msk-connect, ensure the following components are pre-configured in your AWS environment: Aurora PostgreSQL Setup - An Aurora PostgreSQL cluster is already created. - A database named myapp is created within the cluster. - A sample table named users is present under schema public with sample data inserted.
VPC Configuration - A VPC Endpoint for S3 (Gateway type) is created to allow private communication between MSK Connect and S3.
Plugins Downloaded and Uploaded to S3
Download the required Kafka Connect plugins and upload them to the appropriate S3 bucket:
JDBC Source Plugin - Plugin: confluentinc-kafka-connect-jdbc-10.6.6.zip
S3 Sink Plugin - Plugin: confluentinc-kafka-connect-s3-10.6.6.zip
Module Overview¶
Once the above prerequisites are met, you can deploy the Terraform example to configure the data pipeline using:
Refer to the Terraform Registry for the latest version.
Integration with Existing Terraform Configurations¶
Integration with Existing Terraform Configurations¶
Integrate the module with your existing Terraform mono repo configuration, follow the steps below:
- Create a new folder in terraform/ named msk.
- Create the required files, see the examples to base off of.
- Configure with your backend:
- Create the environment backend configuration file: config.
.hcl - region: Where the backend resides
- key:
/terraform.tfstate - bucket: Bucket name where the terraform state will reside
- dynamodb_table: Lock table so there are not duplicate tfplans in the mix
- encrypt: Encrypt all traffic to and from the backend
- Create the environment backend configuration file: config.
Required AWS Permissions¶
Ensure that the AWS credentials used to execute Terraform have the necessary permissions to create, list and modify:
- Amazon MSK clusters and configurations
- IAM roles and policies
- KMS keys (if encryption is enabled)
- CloudWatch logs and metrics
- Security groups and VPC resources
- Secrets Manager resources (for SASL/SCRAM authentication)
Module Configuration¶
Input Variables¶
For a list of input variables, see the README Inputs section.
Output Values¶
For a list of outputs, see the README Outputs section.
Module Usage¶
Basic Usage¶
For basic usage, see the example folder.
This example will create:
- An MSK cluster with customizable broker configuration
- Client authentication with SASL/SCRAM
- CloudWatch logging
- Prometheus monitoring with JMX and Node exporters
Tips and Recommendations¶
- The module focuses on provisioning secure and scalable MSK clusters. The convention-based approach enables downstream services to easily connect to the Kafka cluster. Adjust the configuration parameters as needed for your specific use case.
- Consider using the storage autoscaling feature for production workloads to handle growing data volumes.
- For high availability, deploy the MSK cluster across multiple availability zones.
- Use appropriate authentication methods (SASL/SCRAM, IAM, TLS) based on your security requirements.
- Enable monitoring and logging for better observability and troubleshooting.
Troubleshooting¶
Reporting Issues¶
If you encounter a bug or issue, please report it on the GitHub repository.
Security Considerations¶
AWS VPC¶
Understand the security considerations related to MSK on AWS when using this module: - MSK clusters should be deployed in private subnets with appropriate security groups. - Use encryption in transit and at rest for sensitive data. - Implement proper authentication mechanisms (SASL/SCRAM, IAM, TLS).
Best Practices for AWS MSK¶
Follow best practices to ensure secure MSK configurations:
- AWS MSK Security Best Practices
- Enable encryption in transit and at rest
- Use IAM authentication or SASL/SCRAM for client authentication
- Implement proper network isolation using security groups
- Regularly update Kafka versions to benefit from security patches
Contributing and Community Support¶
Contributing Guidelines¶
Contribute to the module by following the guidelines outlined in the CONTRIBUTING.md file.
Reporting Bugs and Issues¶
If you find a bug or issue, report it on the GitHub repository.
License¶
License Information¶
This module is licensed under the Apache 2.0 license. Refer to the LICENSE file for more details.
Open Source Contribution¶
Contribute to open source by using and enhancing this module. Your contributions are welcome!