terraform-aws-arc-observability-stack ¶

Introduction¶

The Observability Terraform Module is a comprehensive solution designed to simplify the deployment of a full-stack observability ecosystem in Kubernetes environments. This module enables organizations to monitor and troubleshoot their infrastructure and applications effectively, offering the flexibility to choose between various open-source tools.

Key Features:¶

EFK Stack for Log Management:

Deploy either Fluentd or Fluent Bit as the log collector, providing lightweight and efficient options for log aggregation.
Seamlessly integrate with either Elasticsearch or OpenSearch for scalable and reliable log storage.

Prometheus Stack for Metrics Monitoring:

Includes Prometheus for metrics collection and Alertmanager for alerting.
Integrated support for Grafana, offering rich dashboards to visualize metrics effectively.
Enables monitoring of HTTP endpoints using the Blackbox Exporter.

Flexibility and Customization:

Fully customizable configurations for each component, allowing fine-grained control over deployment and resources.
Supports multiple log collectors and storage backends, giving users the freedom to choose based on their requirements.

Streamlined Deployment:

Automates the deployment of the entire observability stack, reducing complexity and ensuring consistency.
Includes preconfigured dashboards and alert rules for quick setup and immediate insights.

Signoz Community Edition Support - Adds native support for Signoz CE, an all-in-one observability platform. - Enables logs, metrics, and traces to be collected and correlated in one unified interface. - Simplifies tracing setup with OpenTelemetry Collector and works out of the box with distributed applications.

For more information about this repository and its usage, please see Terraform AWS ARC Observability Module Usage Guide.

Create the following resources in a single region.

VPC
Multi-AZ private and public subnets
Route tables, internet gateway, and NAT gateways
Configurable VPC Endpoints

Prerequisites¶

Before using this module, ensure you have the following:

AWS credentials configured.
Terraform installed.
A working knowledge of Terraform.

Usage¶

See the examples folder for a complete example.

EFK Stack¶

module "efk" {
  source                      = "sourcefuse/arc-observability-stack/aws"
  version                     = "0.0.1"

  environment = var.environment
  namespace   = var.namespace

  search_engine  = "elasticsearch"
  log_aggregator = "fluentd"

  elasticsearch_config = {
    name = "elasticsearch-master"
    k8s_namespace = {
      name   = "logging"
      create = true
    }

    tls_self_signed_cert_data = {
      organisation          = "ARC"
      validity_period_hours = 26280 # 3 years validity
      early_renewal_hours   = 168   # 1 week early renewal
    }

    cluster_config = {
      port           = "9200"
      transport_port = "9300"
      user           = "elastic"
      log_level      = "INFO"
      cpu_limit      = "2000m"
      memory_limit   = "4Gi"
      cpu_request    = "1000m"
      memory_request = "2Gi"
      storage_class  = "gp2"
      storage        = "40Gi"
    }

    kibana_config = {
      log_level      = "info"
      cpu_limit      = "500m"
      memory_limit   = "1Gi"
      cpu_request    = "250m"
      memory_request = "500Mi"

      ingress_enabled     = true
      aws_certificate_arn = "arn:aws:acm:us-east-1:xx:certificate/xx-46e7-4d99-a523-xxxx"
      ingress_host        = "kibana.xx-xx.xx"

    }
  }

  fluentd_config = {
    k8s_namespace = {
      name   = "logging"
      create = false
    }
    name                = "fluentd"
    search_engine       = "elasticsearch"
    cpu_limit           = "100m"
    memory_limit        = "512Mi"
    cpu_request         = "100m"
    memory_request      = "128Mi"
    logstash_dateformat = "%Y.%m.%d"
    log_level           = "info"
  }
}

Prometheus¶

module "prometheus" {
  source                      = "sourcefuse/arc-observability-stack/aws"
  version                     = "0.0.1"

  environment = var.environment
  namespace   = var.namespace

  metrics_monitoring_system = "prometheus"

  prometheus_config = {
    k8s_namespace = {
      name   = "metrics"
      create = true
    }
    log_level = "info"
    resources = {
      cpu_limit      = "100m"
      memory_limit   = "512Mi"
      cpu_request    = "100m"
      memory_request = "128Mi"
    }
    replicas                  = 1
    storage                   = "8Gi"
    enable_kube_state_metrics = true
    enable_node_exporter      = true
    retention_period          = "30d"

    grafana_config = {
      replicas            = 1
      ingress_enabled     = true
      ingress_host        = "grafana.arc-xx.xx"
      aws_certificate_arn = "arn:aws:acm:us-east-1:xx:certificate/xx-46e7-4d99-a523-xxxx"
      lb_visibility       = "internet-facing"
      dashboard_list = [
        {
          name = "node-metrics"
          json = templatefile("${path.module}/grafana-dashbord.json", {})
        }
      ]
    }

    blackbox_exporter_config = {
      name = "blackbox-exporter"
      monitoring_targets = [{
        name                     = "google"
        url                      = "https://google.com"
        scrape_interval          = "60s"
        status_code_pattern_list = "[http_2xx]" // Note :- This i string not list
      }]
    }

    alertmanager_config = {
      name            = "alertmanager"
      replica_count   = 1
      alert_rule_yaml = ""
    }

  }
}

Requirements¶

Name	Version
terraform	>= 1.4, < 2.0.0
aws	>= 4.0, < 6.0
helm	2.17.0
random	~> 3.6.0
tls	~> 4.0.6

Providers¶

No providers.

Modules¶

Name	Source	Version
elasticsearch	./modules/elasticsearch	n/a
fluentbit	./modules/fluent-bit	n/a
fluentd	./modules/fluentd	n/a
jaeger	./modules/jaeger	n/a
prometheus	./modules/prometheus	n/a
signoz	./modules/signoz	n/a
signoz_metrics_logs	./modules/signoz-infra	n/a

Resources¶

No resources.

Inputs¶

Name	Description	Type	Default	Required
elasticsearch_config	Configuration settings for deploying Elasticsearch	object({ name = optional(string, "elasticsearch-master") # Name of the Elasticsearch cluster k8s_namespace = object({ name = optional(string, "logging") create = optional(bool, true) }) tls_self_signed_cert_data = optional(object({ # Self-signed TLS certificate data organisation = optional(string, null) # Organisation name for certificate validity_period_hours = optional(number, 26280) # 3 years validity early_renewal_hours = optional(number, 168) # 1 week early renewal })) cluster_config = object({ port = optional(string, "9200") # Elasticsearch HTTP port transport_port = optional(string, "9300") # Elasticsearch transport port user = optional(string, "elastic") # Elasticsearch username log_level = optional(string, "INFO") # Log level (DEBUG, INFO, WARN, ERROR) cpu_limit = optional(string, "2000m") # CPU limit for the Elasticsearch container memory_limit = optional(string, "4Gi") # Memory limit for the Elasticsearch container cpu_request = optional(string, "1000m") # CPU request for the Elasticsearch container memory_request = optional(string, "2Gi") # Memory request for the Elasticsearch container storage_class = optional(string, "gp2") storage = optional(string, "30Gi") # Persistent volume storage for Elasticsearch replica_count = optional(string, 3) }) kibana_config = object({ name = optional(string, "kibana") replica_count = optional(string, 3) http_port = optional(string, "5601") user = optional(string, "elastic") log_level = optional(string, "info") // values include Options are all, fatal, error, warn, info, debug, trace, off cpu_limit = optional(string, "500m") memory_limit = optional(string, "1Gi") cpu_request = optional(string, "250m") memory_request = optional(string, "500Mi") ingress_enabled = optional(bool, false) ingress_host = optional(string, "") aws_certificate_arn = optional(string, "") lb_visibility = optional(string, "internet-facing") }) })	{ "cluster_config": { "cpu_limit": "2000m", "cpu_request": "1000m", "log_level": "INFO", "memory_limit": "4Gi", "memory_request": "2Gi", "port": "9200", "replica_count": 3, "storage": "30Gi", "transport_port": "9300", "user": "elastic" }, "k8s_namespace": { "create": true, "name": "logging" }, "kibana_config": { "cpu_limit": "500m", "cpu_request": "250m", "elasticsearch_url": "https://elasticsearch-master:9200", "http_port": "5601", "ingress_enabled": false, "ingress_host": "", "log_level": "info", "memory_limit": "1Gi", "memory_request": "500Mi", "name": "kibana", "user": "elastic" }, "name": "elasticsearch-master", "tls_self_signed_cert_data": { "early_renewal_hours": 168, "organisation": null, "validity_period_hours": 26280 } }	no
environment	Environment name	`string`	n/a	yes
fluentbit_config	Configuration for Fluentbit	object({ k8s_namespace = object({ name = optional(string, "logging") create = optional(bool, false) }) name = optional(string, "fluent-bit") cpu_limit = optional(string, "100m") memory_limit = optional(string, "512Mi") cpu_request = optional(string, "100m") memory_request = optional(string, "128Mi") logstash_dateformat = optional(string, "%Y.%m.%d") # Default time format time_format = optional(string, "%Y-%m-%dT%H:%M:%S.%L") log_level = optional(string, "info") # Default log level aws_region = optional(string, "") aws_role_arn = optional(string, "") })	{ "cpu_limit": "100m", "cpu_request": "100m", "k8s_namespace": { "create": false, "name": "logging" }, "logstash_dateformat": "%Y.%m.%d", "memory_limit": "512Mi", "memory_request": "128Mi", "name": "fluent-bit", "search_engine": "elasticsearch" }	no
fluentd_config	Configuration for Fluentd	object({ k8s_namespace = object({ name = optional(string, "logging") create = optional(bool, false) }) name = optional(string, "fluentd") cpu_limit = optional(string, "100m") memory_limit = optional(string, "512Mi") cpu_request = optional(string, "100m") memory_request = optional(string, "128Mi") logstash_dateformat = optional(string, "%Y.%m.%d") # Default time format log_level = optional(string, "info") # Default log level opensearch_url = optional(string, "") aws_region = optional(string, "") aws_role_arn = optional(string, "") })	{ "cpu_limit": "100m", "cpu_request": "100m", "k8s_namespace": { "create": false, "name": "logging" }, "logstash_dateformat": "%Y.%m.%d", "memory_limit": "512Mi", "memory_request": "128Mi", "name": "fluentd", "search_engine": "elasticsearch" }	no
log_aggregator	(optional) Log aggregator to choose	`string`	`null`	no
metrics_monitoring_system	Monotoring system for metrics	`string`	`null`	no
namespace	Namespace for the resources.	`string`	n/a	yes
prometheus_config	Configuration settings for deploying Prometheus	object({ name = optional(string, "prometheus") k8s_namespace = object({ name = optional(string, "metrics") create = optional(bool, true) }) log_level = optional(string, "info") replica_count = optional(number, 1) storage = optional(string, "8Gi") storage_class = optional(string, "gp2") enable_kube_state_metrics = optional(bool, true) enable_node_exporter = optional(bool, true) cpu_limit = optional(string, "100m") memory_limit = optional(string, "512Mi") cpu_request = optional(string, "100m") memory_request = optional(string, "128Mi") retention_period = optional(string, "15d") grafana_config = object({ name = optional(string, "grafana") replica_count = optional(number, 1) ingress_enabled = optional(bool, false) lb_visibility = optional(string, "internet-facing") # Options: "internal" or "internet-facing" aws_certificate_arn = optional(string, "") ingress_host = optional(string, "") admin_user = optional(string, "admin") cpu_limit = optional(string, "100m") memory_limit = optional(string, "128Mi") cpu_request = optional(string, "100m") memory_request = optional(string, "128Mi") dashboard_list = optional(list(object({ name = string json = string })), []) }) blackbox_exporter_config = object({ name = optional(string, "blackbox-exporter") replica_count = optional(number, 1) cpu_limit = optional(string, "100m") memory_limit = optional(string, "500Mi") cpu_request = optional(string, "100m") memory_request = optional(string, "50Mi") monitoring_targets = list(object({ name = string # Target name (e.g., google) url = string # URL to monitor (e.g., https://google.com) scrape_interval = optional(string, "60s") # Scrape interval (e.g., 60s) scrape_timeout = optional(string, "60s") # Scrape timeout (e.g., 60s) status_code_pattern_list = optional(string, "[http_2xx]") # Blackbox module to use (e.g., http_2xx) })) }) alertmanager_config = object({ name = optional(string, "alertmanager") replica_count = optional(number, 1) cpu_limit = optional(string, "100m") memory_limit = optional(string, "128Mi") cpu_request = optional(string, "10m") memory_request = optional(string, "32Mi") custom_alerts = optional(string, "") alert_notification_settings = optional(string, "") }) })	{ "alertmanager_config": { "name": "alertmanager" }, "blackbox_exporter_config": { "monitoring_targets": [], "name": "blackbox-exporter" }, "enable_kube_state_metrics": true, "enable_node_exporter": true, "grafana_config": { "admin_user": "admin", "ingress_enabled": false, "lb_visibility": "internet-facing", "prometheus_endpoint": "prometheus" }, "k8s_namespace": { "create": true, "name": "metrics" }, "log_level": "info", "replica_count": 1, "resources": { "cpu_limit": "100m", "cpu_request": "100m", "memory_limit": "512Mi", "memory_request": "128Mi" }, "retention_period": "15d", "storage": "8Gi" }	no
search_engine	(optional) Search engine for logs	`string`	`null`	no
signoz_config	Configuration for observability components in the monitoring stack. This variable encapsulates settings for the following components: - ClickHouse: Used as the backend storage engine for observability data (like traces and metrics). Includes credentials and resource limits/requests for tuning performance. - SigNoz: Provides the UI and analytics for monitoring and tracing applications. Includes ingress setup and compute resource configuration. - Alertmanager: Handles alerting rules and notifications for monitoring data. Includes configuration for storage, scaling, and ingress settings. - OTEL Collector: Collects telemetry data (logs, metrics, traces) from the applications and routes it to appropriate backends. Includes resource definitions and optional ingress configuration. This structure enables centralized management of observability stack deployment in Kubernetes via Terraform.	object({ k8s_namespace = object({ name = optional(string, "signoz") create = optional(bool, false) }) name = optional(string, "signoz") storage_class = optional(string, "gp3") chart_version = optional(string, "0.78.0") chart_values = optional(list(string), []) cluster_name = string clickhouse = optional(object({ user = optional(string, "admin") cpu_limit = optional(string, "2000m") memory_limit = optional(string, "4Gi") cpu_request = optional(string, "100m") memory_request = optional(string, "200Mi") storage = optional(string, "20Gi") })) signoz_bin = optional(object({ replica_count = optional(number, 1) cpu_limit = optional(string, "750m") memory_limit = optional(string, "1000Mi") cpu_request = optional(string, "100m") memory_request = optional(string, "200Mi") ingress_enabled = optional(bool, false) aws_certificate_arn = optional(string, null) domain = string root_domain = optional(string, null) // if root domain is provided, it creates DNS record lb_visibility = optional(string, "internet-facing") # Options: "internal" or "internet-facing" })) alertmanager = optional(object({ enable = optional(bool, false) replica_count = optional(number, 1) cpu_limit = optional(string, "750m") memory_limit = optional(string, "1000Mi") cpu_request = optional(string, "100m") memory_request = optional(string, "200Mi") storage = optional(string, "100Mi") enable_ingress = optional(bool, false) aws_certificate_arn = optional(string, null) domain = optional(string, "signoz.example.com") })) otel_collector = optional(object({ cpu_limit = optional(string, "1") memory_limit = optional(string, "2Gi") cpu_request = optional(string, "100m") memory_request = optional(string, "200Mi") storage = optional(string, "100Mi") enable_ingress = optional(bool, false) aws_certificate_arn = optional(string, null) domain = optional(string, "signoz.example.com") })) })	{ "cluster_name": null, "k8s_namespace": { "create": true, "name": "signoz" }, "name": null }	no
signoz_infra_monitor_config	Configuration object for deploying SigNoz infrastructure monitoring components. Attributes: - name: A name identifier for the monitoring deployment (used in naming resources). - storage_class: (Optional) The Kubernetes storage class to be used for persistent volumes. Defaults to "gp3". - cluster_name: The name of the Kubernetes cluster where SigNoz is being deployed. - otel_collector_endpoint: The endpoint URL for the OpenTelemetry Collector to which metrics, logs, and traces will be exported. - metric_collection_interval: (Optional) The interval at which metrics are collected. Defaults to "30s". - if any one ofr the values enable_log_collection,enable_metrics_collection is true, then helm chart gets installed This variable is used to centralize configuration related to monitoring infrastructure via SigNoz.	object({ k8s_namespace = optional(object({ name = optional(string, "signoz") create = optional(bool, false) })) name = string storage_class = optional(string, "gp3") chart_version = optional(string, "0.12.1") chart_values = optional(list(string), []) cluster_name = string enable_log_collection = optional(bool, false) enable_metrics_collection = optional(bool, false) otel_collector_endpoint = optional(string, null) metric_collection_interval = optional(string, "30s") })	{ "cluster_name": null, "name": null }	no
tags	(optional) Tags for AWS resources	`map(string)`	`{}`	no
tracing_stack	(optional) Distributed tracing stack	`string`	`null`	no

Outputs¶

Name	Description
grafana_lb_dns	Grafana ingress loadbalancer DNS
kibana_lb_dns	Kibana ingress loadbalancer DNS
otel_collector_endpoint	OTEL collector endpoint
signoz_lb_dns	Signoz ingress loadbalancer DNS

Development¶

Prerequisites¶

Configurations¶

Configure pre-commit hooks
1
pre-commit install

Configure golang deps for tests

go get github.com/gruntwork-io/terratest/modules/terraform
go get github.com/stretchr/testify/assert

Git commits¶

while Contributing or doing git commit please specify the breaking change in your commit message whether its major,minor or patch

For Example

git commit -m "your commit message #major"

By specifying this , it will bump the version and if you dont specify this in your commit message then by default it will consider patch and will bump that accordingly

Tests¶

Tests are available in test directory
In the test directory, run the below command
1
go test -timeout 1800s

Authors¶

This project is authored by: - SourceFuse

terraform-aws-arc-observability-stack¶

Introduction¶

Key Features:¶

Prerequisites¶

Usage¶

EFK Stack¶

Prometheus¶

Requirements¶

Providers¶

Modules¶

Resources¶

Inputs¶

Outputs¶

Development¶

Prerequisites¶

Configurations¶

Git commits¶

Tests¶

Authors¶

terraform-aws-arc-observability-stack ¶