Skip to content

Module Structure

terraform-aws-arc-observability-stack

Latest Release Last Updated Terraform GitHub Actions

Quality gate

Known Vulnerabilities

Introduction

The Observability Terraform Module is a comprehensive solution designed to simplify the deployment of a full-stack observability ecosystem in Kubernetes environments. This module enables organizations to monitor and troubleshoot their infrastructure and applications effectively, offering the flexibility to choose between various open-source tools.

Key Features:

  1. EFK Stack for Log Management:
  • Deploy either Fluentd or Fluent Bit as the log collector, providing lightweight and efficient options for log aggregation.
  • Seamlessly integrate with either Elasticsearch or OpenSearch for scalable and reliable log storage.
  1. Prometheus Stack for Metrics Monitoring:
  • Includes Prometheus for metrics collection and Alertmanager for alerting.
  • Integrated support for Grafana, offering rich dashboards to visualize metrics effectively.
  • Enables monitoring of HTTP endpoints using the Blackbox Exporter.
  1. Flexibility and Customization:
  • Fully customizable configurations for each component, allowing fine-grained control over deployment and resources.
  • Supports multiple log collectors and storage backends, giving users the freedom to choose based on their requirements.
  1. Streamlined Deployment:
  • Automates the deployment of the entire observability stack, reducing complexity and ensuring consistency.
  • Includes preconfigured dashboards and alert rules for quick setup and immediate insights.
  1. Signoz Community Edition Support - Adds native support for Signoz CE, an all-in-one observability platform. - Enables logs, metrics, and traces to be collected and correlated in one unified interface. - Simplifies tracing setup with OpenTelemetry Collector and works out of the box with distributed applications.

For more information about this repository and its usage, please see Terraform AWS ARC Observability Module Usage Guide.

Create the following resources in a single region.

  • VPC
  • Multi-AZ private and public subnets
  • Route tables, internet gateway, and NAT gateways
  • Configurable VPC Endpoints

Prerequisites

Before using this module, ensure you have the following:

  • AWS credentials configured.
  • Terraform installed.
  • A working knowledge of Terraform.

Usage

See the examples folder for a complete example.

EFK Stack

module "efk" {
  source                      = "sourcefuse/arc-observability-stack/aws"
  version                     = "0.0.1"

  environment = var.environment
  namespace   = var.namespace

  search_engine  = "elasticsearch"
  log_aggregator = "fluentd"

  elasticsearch_config = {
    name = "elasticsearch-master"
    k8s_namespace = {
      name   = "logging"
      create = true
    }

    tls_self_signed_cert_data = {
      organisation          = "ARC"
      validity_period_hours = 26280 # 3 years validity
      early_renewal_hours   = 168   # 1 week early renewal
    }

    cluster_config = {
      port           = "9200"
      transport_port = "9300"
      user           = "elastic"
      log_level      = "INFO"
      cpu_limit      = "2000m"
      memory_limit   = "4Gi"
      cpu_request    = "1000m"
      memory_request = "2Gi"
      storage_class  = "gp2"
      storage        = "40Gi"
    }

    kibana_config = {
      log_level      = "info"
      cpu_limit      = "500m"
      memory_limit   = "1Gi"
      cpu_request    = "250m"
      memory_request = "500Mi"

      ingress_enabled     = true
      aws_certificate_arn = "arn:aws:acm:us-east-1:xx:certificate/xx-46e7-4d99-a523-xxxx"
      ingress_host        = "kibana.xx-xx.xx"

    }
  }

  fluentd_config = {
    k8s_namespace = {
      name   = "logging"
      create = false
    }
    name                = "fluentd"
    search_engine       = "elasticsearch"
    cpu_limit           = "100m"
    memory_limit        = "512Mi"
    cpu_request         = "100m"
    memory_request      = "128Mi"
    logstash_dateformat = "%Y.%m.%d"
    log_level           = "info"
  }
}

Prometheus

module "prometheus" {
  source                      = "sourcefuse/arc-observability-stack/aws"
  version                     = "0.0.1"

  environment = var.environment
  namespace   = var.namespace

  metrics_monitoring_system = "prometheus"

  prometheus_config = {
    k8s_namespace = {
      name   = "metrics"
      create = true
    }
    log_level = "info"
    resources = {
      cpu_limit      = "100m"
      memory_limit   = "512Mi"
      cpu_request    = "100m"
      memory_request = "128Mi"
    }
    replicas                  = 1
    storage                   = "8Gi"
    enable_kube_state_metrics = true
    enable_node_exporter      = true
    retention_period          = "30d"

    grafana_config = {
      replicas            = 1
      ingress_enabled     = true
      ingress_host        = "grafana.arc-xx.xx"
      aws_certificate_arn = "arn:aws:acm:us-east-1:xx:certificate/xx-46e7-4d99-a523-xxxx"
      lb_visibility       = "internet-facing"
      dashboard_list = [
        {
          name = "node-metrics"
          json = templatefile("${path.module}/grafana-dashbord.json", {})
        }
      ]
    }

    blackbox_exporter_config = {
      name = "blackbox-exporter"
      monitoring_targets = [{
        name                     = "google"
        url                      = "https://google.com"
        scrape_interval          = "60s"
        status_code_pattern_list = "[http_2xx]" // Note :- This i string not list
      }]
    }

    alertmanager_config = {
      name            = "alertmanager"
      replica_count   = 1
      alert_rule_yaml = ""
    }

  }
}

Requirements

Name Version
terraform >= 1.4, < 2.0.0
aws >= 4.0, < 6.0
helm 2.17.0
random ~> 3.6.0
tls ~> 4.0.6

Providers

No providers.

Modules

Name Source Version
elasticsearch ./modules/elasticsearch n/a
fluentbit ./modules/fluent-bit n/a
fluentd ./modules/fluentd n/a
jaeger ./modules/jaeger n/a
prometheus ./modules/prometheus n/a
signoz ./modules/signoz n/a
signoz_metrics_logs ./modules/signoz-infra n/a

Resources

No resources.

Inputs

Name Description Type Default Required
elasticsearch_config Configuration settings for deploying Elasticsearch
object({
name = optional(string, "elasticsearch-master") # Name of the Elasticsearch cluster
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, true)
})

tls_self_signed_cert_data = optional(object({ # Self-signed TLS certificate data
organisation = optional(string, null) # Organisation name for certificate
validity_period_hours = optional(number, 26280) # 3 years validity
early_renewal_hours = optional(number, 168) # 1 week early renewal
}))

cluster_config = object({
port = optional(string, "9200") # Elasticsearch HTTP port
transport_port = optional(string, "9300") # Elasticsearch transport port
user = optional(string, "elastic") # Elasticsearch username
log_level = optional(string, "INFO") # Log level (DEBUG, INFO, WARN, ERROR)
cpu_limit = optional(string, "2000m") # CPU limit for the Elasticsearch container
memory_limit = optional(string, "4Gi") # Memory limit for the Elasticsearch container
cpu_request = optional(string, "1000m") # CPU request for the Elasticsearch container
memory_request = optional(string, "2Gi") # Memory request for the Elasticsearch container
storage_class = optional(string, "gp2")
storage = optional(string, "30Gi") # Persistent volume storage for Elasticsearch
replica_count = optional(string, 3)
})

kibana_config = object({
name = optional(string, "kibana")
replica_count = optional(string, 3)
http_port = optional(string, "5601")
user = optional(string, "elastic")
log_level = optional(string, "info") // values include Options are all, fatal, error, warn, info, debug, trace, off
cpu_limit = optional(string, "500m")
memory_limit = optional(string, "1Gi")
cpu_request = optional(string, "250m")
memory_request = optional(string, "500Mi")
ingress_enabled = optional(bool, false)
ingress_host = optional(string, "")
aws_certificate_arn = optional(string, "")
lb_visibility = optional(string, "internet-facing")
})
})
{
"cluster_config": {
"cpu_limit": "2000m",
"cpu_request": "1000m",
"log_level": "INFO",
"memory_limit": "4Gi",
"memory_request": "2Gi",
"port": "9200",
"replica_count": 3,
"storage": "30Gi",
"transport_port": "9300",
"user": "elastic"
},
"k8s_namespace": {
"create": true,
"name": "logging"
},
"kibana_config": {
"cpu_limit": "500m",
"cpu_request": "250m",
"elasticsearch_url": "https://elasticsearch-master:9200",
"http_port": "5601",
"ingress_enabled": false,
"ingress_host": "",
"log_level": "info",
"memory_limit": "1Gi",
"memory_request": "500Mi",
"name": "kibana",
"user": "elastic"
},
"name": "elasticsearch-master",
"tls_self_signed_cert_data": {
"early_renewal_hours": 168,
"organisation": null,
"validity_period_hours": 26280
}
}
no
environment Environment name string n/a yes
fluentbit_config Configuration for Fluentbit
object({
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, false)
})
name = optional(string, "fluent-bit")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
logstash_dateformat = optional(string, "%Y.%m.%d") # Default time format
time_format = optional(string, "%Y-%m-%dT%H:%M:%S.%L")
log_level = optional(string, "info") # Default log level
aws_region = optional(string, "")
aws_role_arn = optional(string, "")
})
{
"cpu_limit": "100m",
"cpu_request": "100m",
"k8s_namespace": {
"create": false,
"name": "logging"
},
"logstash_dateformat": "%Y.%m.%d",
"memory_limit": "512Mi",
"memory_request": "128Mi",
"name": "fluent-bit",
"search_engine": "elasticsearch"
}
no
fluentd_config Configuration for Fluentd
object({
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, false)
})
name = optional(string, "fluentd")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
logstash_dateformat = optional(string, "%Y.%m.%d") # Default time format
log_level = optional(string, "info") # Default log level
opensearch_url = optional(string, "")
aws_region = optional(string, "")
aws_role_arn = optional(string, "")
})
{
"cpu_limit": "100m",
"cpu_request": "100m",
"k8s_namespace": {
"create": false,
"name": "logging"
},
"logstash_dateformat": "%Y.%m.%d",
"memory_limit": "512Mi",
"memory_request": "128Mi",
"name": "fluentd",
"search_engine": "elasticsearch"
}
no
log_aggregator (optional) Log aggregator to choose string null no
metrics_monitoring_system Monotoring system for metrics string null no
namespace Namespace for the resources. string n/a yes
prometheus_config Configuration settings for deploying Prometheus
object({
name = optional(string, "prometheus")
k8s_namespace = object({
name = optional(string, "metrics")
create = optional(bool, true)
})
log_level = optional(string, "info")
replica_count = optional(number, 1)
storage = optional(string, "8Gi")
storage_class = optional(string, "gp2")
enable_kube_state_metrics = optional(bool, true)
enable_node_exporter = optional(bool, true)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
retention_period = optional(string, "15d")

grafana_config = object({
name = optional(string, "grafana")
replica_count = optional(number, 1)
ingress_enabled = optional(bool, false)
lb_visibility = optional(string, "internet-facing") # Options: "internal" or "internet-facing"
aws_certificate_arn = optional(string, "")
ingress_host = optional(string, "")
admin_user = optional(string, "admin")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "128Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
dashboard_list = optional(list(object({
name = string
json = string
})), [])
})

blackbox_exporter_config = object({
name = optional(string, "blackbox-exporter")
replica_count = optional(number, 1)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "500Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "50Mi")
monitoring_targets = list(object({
name = string # Target name (e.g., google)
url = string # URL to monitor (e.g., https://google.com)
scrape_interval = optional(string, "60s") # Scrape interval (e.g., 60s)
scrape_timeout = optional(string, "60s") # Scrape timeout (e.g., 60s)
status_code_pattern_list = optional(string, "[http_2xx]") # Blackbox module to use (e.g., http_2xx)
}))
})

alertmanager_config = object({
name = optional(string, "alertmanager")
replica_count = optional(number, 1)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "128Mi")
cpu_request = optional(string, "10m")
memory_request = optional(string, "32Mi")
custom_alerts = optional(string, "")
alert_notification_settings = optional(string, "")
})
})
{
"alertmanager_config": {
"name": "alertmanager"
},
"blackbox_exporter_config": {
"monitoring_targets": [],
"name": "blackbox-exporter"
},
"enable_kube_state_metrics": true,
"enable_node_exporter": true,
"grafana_config": {
"admin_user": "admin",
"ingress_enabled": false,
"lb_visibility": "internet-facing",
"prometheus_endpoint": "prometheus"
},
"k8s_namespace": {
"create": true,
"name": "metrics"
},
"log_level": "info",
"replica_count": 1,
"resources": {
"cpu_limit": "100m",
"cpu_request": "100m",
"memory_limit": "512Mi",
"memory_request": "128Mi"
},
"retention_period": "15d",
"storage": "8Gi"
}
no
search_engine (optional) Search engine for logs string null no
signoz_config Configuration for observability components in the monitoring stack. This variable encapsulates
settings for the following components:

- ClickHouse:
Used as the backend storage engine for observability data (like traces and metrics).
Includes credentials and resource limits/requests for tuning performance.

- SigNoz:
Provides the UI and analytics for monitoring and tracing applications.
Includes ingress setup and compute resource configuration.

- Alertmanager:
Handles alerting rules and notifications for monitoring data.
Includes configuration for storage, scaling, and ingress settings.

- OTEL Collector:
Collects telemetry data (logs, metrics, traces) from the applications and
routes it to appropriate backends.
Includes resource definitions and optional ingress configuration.

This structure enables centralized management of observability stack deployment in Kubernetes
via Terraform.
object({
k8s_namespace = object({
name = optional(string, "signoz")
create = optional(bool, false)
})
name = optional(string, "signoz")
storage_class = optional(string, "gp3")
cluster_name = string
clickhouse = optional(object({
user = optional(string, "admin")
cpu_limit = optional(string, "2000m")
memory_limit = optional(string, "4Gi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "200Mi")
storage = optional(string, "20Gi")
}))

signoz_bin = optional(object({
replica_count = optional(number, 1)
cpu_limit = optional(string, "750m")
memory_limit = optional(string, "1000Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "200Mi")
ingress_enabled = optional(bool, false)
aws_certificate_arn = optional(string, null)
domain = string
root_domain = optional(string, null) // if root domain is provided, it creates DNS record
lb_visibility = optional(string, "internet-facing") # Options: "internal" or "internet-facing"
}))

alertmanager = optional(object({
enable = optional(bool, false)
replica_count = optional(number, 1)
cpu_limit = optional(string, "750m")
memory_limit = optional(string, "1000Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "200Mi")
storage = optional(string, "100Mi")
enable_ingress = optional(bool, false)
aws_certificate_arn = optional(string, null)
domain = optional(string, "signoz.example.com")
}))

otel_collector = optional(object({
cpu_limit = optional(string, "1")
memory_limit = optional(string, "2Gi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "200Mi")
storage = optional(string, "100Mi")
enable_ingress = optional(bool, false)
aws_certificate_arn = optional(string, null)
domain = optional(string, "signoz.example.com")
}))
})
{
"cluster_name": null,
"k8s_namespace": {
"create": true,
"name": "signoz"
},
"name": null
}
no
signoz_infra_monitor_config Configuration object for deploying SigNoz infrastructure monitoring components.

Attributes:
- name: A name identifier for the monitoring deployment (used in naming resources).
- storage_class: (Optional) The Kubernetes storage class to be used for persistent volumes. Defaults to "gp3".
- cluster_name: The name of the Kubernetes cluster where SigNoz is being deployed.
- otel_collector_endpoint: The endpoint URL for the OpenTelemetry Collector to which metrics, logs, and traces will be exported.
- metric_collection_interval: (Optional) The interval at which metrics are collected. Defaults to "30s".
- if any one ofr the values enable_log_collection,enable_metrics_collection is true, then helm chart gets installed

This variable is used to centralize configuration related to monitoring infrastructure via SigNoz.
object({
k8s_namespace = optional(object({
name = optional(string, "signoz")
create = optional(bool, false)
}))
name = string
storage_class = optional(string, "gp3")
cluster_name = string
enable_log_collection = optional(bool, false)
enable_metrics_collection = optional(bool, false)
otel_collector_endpoint = optional(string, null)
metric_collection_interval = optional(string, "30s")
})
{
"cluster_name": null,
"name": null
}
no
tags (optional) Tags for AWS resources map(string) {} no
tracing_stack (optional) Distributed tracing stack string null no

Outputs

Name Description
grafana_lb_dns Grafana ingress loadbalancer DNS
kibana_lb_dns Kibana ingress loadbalancer DNS
otel_collector_endpoint OTEL collector endpoint
signoz_lb_dns Signoz ingress loadbalancer DNS

Development

Prerequisites

Configurations

  • Configure pre-commit hooks
    pre-commit install
    
  • Configure golang deps for tests
    go get github.com/gruntwork-io/terratest/modules/terraform
    go get github.com/stretchr/testify/assert
    

Git commits

while Contributing or doing git commit please specify the breaking change in your commit message whether its major,minor or patch

For Example

git commit -m "your commit message #major"
By specifying this , it will bump the version and if you dont specify this in your commit message then by default it will consider patch and will bump that accordingly

Tests

  • Tests are available in test directory
  • In the test directory, run the below command
    go test -timeout 1800s
    

Authors

This project is authored by: - SourceFuse