Skip to content

Module Structure

terraform-aws-arc-observability-stack

Latest Release Last Updated Terraform GitHub Actions

Quality gate

Known Vulnerabilities

Introduction

The Observability Terraform Module is a comprehensive solution designed to simplify the deployment of a full-stack observability ecosystem in Kubernetes environments. This module enables organizations to monitor and troubleshoot their infrastructure and applications effectively, offering the flexibility to choose between various open-source tools.

Key Features:

  1. EFK Stack for Log Management:
  • Deploy either Fluentd or Fluent Bit as the log collector, providing lightweight and efficient options for log aggregation.
  • Seamlessly integrate with either Elasticsearch or OpenSearch for scalable and reliable log storage.
  1. Prometheus Stack for Metrics Monitoring:
  • Includes Prometheus for metrics collection and Alertmanager for alerting.
  • Integrated support for Grafana, offering rich dashboards to visualize metrics effectively.
  • Enables monitoring of HTTP endpoints using the Blackbox Exporter.
  1. Flexibility and Customization:
  • Fully customizable configurations for each component, allowing fine-grained control over deployment and resources.
  • Supports multiple log collectors and storage backends, giving users the freedom to choose based on their requirements.
  1. Streamlined Deployment:
  • Automates the deployment of the entire observability stack, reducing complexity and ensuring consistency.
  • Includes preconfigured dashboards and alert rules for quick setup and immediate insights.

For more information about this repository and its usage, please see Terraform AWS ARC Observability Module Usage Guide.

Create the following resources in a single region.

  • VPC
  • Multi-AZ private and public subnets
  • Route tables, internet gateway, and NAT gateways
  • Configurable VPC Endpoints

Prerequisites

Before using this module, ensure you have the following:

  • AWS credentials configured.
  • Terraform installed.
  • A working knowledge of Terraform.

Usage

See the examples folder for a complete example.

EFK Stack

module "efk" {
  source                      = "sourcefuse/arc-observability-stack/aws"
  version                     = "0.0.1"

  environment = var.environment
  namespace   = var.namespace

  search_engine  = "elasticsearch"
  log_aggregator = "fluentd"

  elasticsearch_config = {
    name = "elasticsearch-master"
    k8s_namespace = {
      name   = "logging"
      create = true
    }

    tls_self_signed_cert_data = {
      organisation          = "ARC"
      validity_period_hours = 26280 # 3 years validity
      early_renewal_hours   = 168   # 1 week early renewal
    }

    cluster_config = {
      port           = "9200"
      transport_port = "9300"
      user           = "elastic"
      log_level      = "INFO"
      cpu_limit      = "2000m"
      memory_limit   = "4Gi"
      cpu_request    = "1000m"
      memory_request = "2Gi"
      storage_class  = "gp2"
      storage        = "40Gi"
    }

    kibana_config = {
      log_level      = "info"
      cpu_limit      = "500m"
      memory_limit   = "1Gi"
      cpu_request    = "250m"
      memory_request = "500Mi"

      ingress_enabled     = true
      aws_certificate_arn = "arn:aws:acm:us-east-1:xx:certificate/xx-46e7-4d99-a523-xxxx"
      ingress_host        = "kibana.xx-xx.xx"

    }
  }

  fluentd_config = {
    k8s_namespace = {
      name   = "logging"
      create = false
    }
    name                = "fluentd"
    search_engine       = "elasticsearch"
    cpu_limit           = "100m"
    memory_limit        = "512Mi"
    cpu_request         = "100m"
    memory_request      = "128Mi"
    logstash_dateformat = "%Y.%m.%d"
    log_level           = "info"
  }
}

Prometheus

module "prometheus" {
  source                      = "sourcefuse/arc-observability-stack/aws"
  version                     = "0.0.1"

  environment = var.environment
  namespace   = var.namespace

  metrics_monitoring_system = "prometheus"

  prometheus_config = {
    k8s_namespace = {
      name   = "metrics"
      create = true
    }
    log_level = "info"
    resources = {
      cpu_limit      = "100m"
      memory_limit   = "512Mi"
      cpu_request    = "100m"
      memory_request = "128Mi"
    }
    replicas                  = 1
    storage                   = "8Gi"
    enable_kube_state_metrics = true
    enable_node_exporter      = true
    retention_period          = "30d"

    grafana_config = {
      replicas            = 1
      ingress_enabled     = true
      ingress_host        = "grafana.arc-xx.xx"
      aws_certificate_arn = "arn:aws:acm:us-east-1:xx:certificate/xx-46e7-4d99-a523-xxxx"
      lb_visibility       = "internet-facing"
      dashboard_list = [
        {
          name = "node-metrics"
          json = templatefile("${path.module}/grafana-dashbord.json", {})
        }
      ]
    }

    blackbox_exporter_config = {
      name = "blackbox-exporter"
      monitoring_targets = [{
        name                     = "google"
        url                      = "https://google.com"
        scrape_interval          = "60s"
        status_code_pattern_list = "[http_2xx]" // Note :- This i string not list
      }]
    }

    alertmanager_config = {
      name            = "alertmanager"
      replica_count   = 1
      alert_rule_yaml = ""
    }

  }
}

Requirements

Name Version
terraform >= 1.4, < 2.0.0
aws >= 4.0, < 6.0
helm 2.17.0
random ~> 3.6.0
tls ~> 4.0.6

Providers

No providers.

Modules

Name Source Version
elasticsearch ./modules/elasticsearch n/a
fluentbit ./modules/fluent-bit n/a
fluentd ./modules/fluentd n/a
prometheus ./modules/prometheus n/a

Resources

No resources.

Inputs

Name Description Type Default Required
elasticsearch_config Configuration settings for deploying Elasticsearch
object({
name = optional(string, "elasticsearch-master") # Name of the Elasticsearch cluster
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, true)
})

tls_self_signed_cert_data = optional(object({ # Self-signed TLS certificate data
organisation = optional(string, null) # Organisation name for certificate
validity_period_hours = optional(number, 26280) # 3 years validity
early_renewal_hours = optional(number, 168) # 1 week early renewal
}))

cluster_config = object({
port = optional(string, "9200") # Elasticsearch HTTP port
transport_port = optional(string, "9300") # Elasticsearch transport port
user = optional(string, "elastic") # Elasticsearch username
log_level = optional(string, "INFO") # Log level (DEBUG, INFO, WARN, ERROR)
cpu_limit = optional(string, "2000m") # CPU limit for the Elasticsearch container
memory_limit = optional(string, "4Gi") # Memory limit for the Elasticsearch container
cpu_request = optional(string, "1000m") # CPU request for the Elasticsearch container
memory_request = optional(string, "2Gi") # Memory request for the Elasticsearch container
storage_class = optional(string, "gp2")
storage = optional(string, "30Gi") # Persistent volume storage for Elasticsearch
replica_count = optional(string, 3)
})

kibana_config = object({
name = optional(string, "kibana")
replica_count = optional(string, 3)
http_port = optional(string, "5601")
user = optional(string, "elastic")
log_level = optional(string, "info") // values include Options are all, fatal, error, warn, info, debug, trace, off
cpu_limit = optional(string, "500m")
memory_limit = optional(string, "1Gi")
cpu_request = optional(string, "250m")
memory_request = optional(string, "500Mi")
ingress_enabled = optional(bool, false)
ingress_host = optional(string, "")
aws_certificate_arn = optional(string, "")
lb_visibility = optional(string, "internet-facing")
})
})
{
"cluster_config": {
"cpu_limit": "2000m",
"cpu_request": "1000m",
"log_level": "INFO",
"memory_limit": "4Gi",
"memory_request": "2Gi",
"port": "9200",
"replica_count": 3,
"storage": "30Gi",
"transport_port": "9300",
"user": "elastic"
},
"k8s_namespace": {
"create": true,
"name": "logging"
},
"kibana_config": {
"cpu_limit": "500m",
"cpu_request": "250m",
"elasticsearch_url": "https://elasticsearch-master:9200",
"http_port": "5601",
"ingress_enabled": false,
"ingress_host": "",
"log_level": "info",
"memory_limit": "1Gi",
"memory_request": "500Mi",
"name": "kibana",
"user": "elastic"
},
"name": "elasticsearch-master",
"tls_self_signed_cert_data": {
"early_renewal_hours": 168,
"organisation": null,
"validity_period_hours": 26280
}
}
no
environment Environment name string n/a yes
fluentbit_config Configuration for Fluentbit
object({
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, false)
})
name = optional(string, "fluent-bit")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
logstash_dateformat = optional(string, "%Y.%m.%d") # Default time format
time_format = optional(string, "%Y-%m-%dT%H:%M:%S.%L")
log_level = optional(string, "info") # Default log level
aws_region = optional(string, "")
aws_role_arn = optional(string, "")
})
{
"cpu_limit": "100m",
"cpu_request": "100m",
"k8s_namespace": {
"create": false,
"name": "logging"
},
"logstash_dateformat": "%Y.%m.%d",
"memory_limit": "512Mi",
"memory_request": "128Mi",
"name": "fluent-bit",
"search_engine": "elasticsearch"
}
no
fluentd_config Configuration for Fluentd
object({
k8s_namespace = object({
name = optional(string, "logging")
create = optional(bool, false)
})
name = optional(string, "fluentd")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
logstash_dateformat = optional(string, "%Y.%m.%d") # Default time format
log_level = optional(string, "info") # Default log level
opensearch_url = optional(string, "")
aws_region = optional(string, "")
aws_role_arn = optional(string, "")
})
{
"cpu_limit": "100m",
"cpu_request": "100m",
"k8s_namespace": {
"create": false,
"name": "logging"
},
"logstash_dateformat": "%Y.%m.%d",
"memory_limit": "512Mi",
"memory_request": "128Mi",
"name": "fluentd",
"search_engine": "elasticsearch"
}
no
log_aggregator (optional) Log aggregator to choose string null no
metrics_monitoring_system Monotoring system for metrics string null no
namespace Namespace for the resources. string n/a yes
prometheus_config Configuration settings for deploying Prometheus
object({
name = optional(string, "prometheus")
k8s_namespace = object({
name = optional(string, "metrics")
create = optional(bool, true)
})
log_level = optional(string, "info")
replica_count = optional(number, 1)
storage = optional(string, "8Gi")
storage_class = optional(string, "gp2")
enable_kube_state_metrics = optional(bool, true)
enable_node_exporter = optional(bool, true)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "512Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
retention_period = optional(string, "15d")

grafana_config = object({
name = optional(string, "grafana")
replica_count = optional(number, 1)
ingress_enabled = optional(bool, false)
lb_visibility = optional(string, "internet-facing") # Options: "internal" or "internet-facing"
aws_certificate_arn = optional(string, "")
ingress_host = optional(string, "")
admin_user = optional(string, "admin")
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "128Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "128Mi")
dashboard_list = optional(list(object({
name = string
json = string
})), [])
})

blackbox_exporter_config = object({
name = optional(string, "blackbox-exporter")
replica_count = optional(number, 1)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "500Mi")
cpu_request = optional(string, "100m")
memory_request = optional(string, "50Mi")
monitoring_targets = list(object({
name = string # Target name (e.g., google)
url = string # URL to monitor (e.g., https://google.com)
scrape_interval = optional(string, "60s") # Scrape interval (e.g., 60s)
scrape_timeout = optional(string, "60s") # Scrape timeout (e.g., 60s)
status_code_pattern_list = optional(string, "[http_2xx]") # Blackbox module to use (e.g., http_2xx)
}))
})

alertmanager_config = object({
name = optional(string, "alertmanager")
replica_count = optional(number, 1)
cpu_limit = optional(string, "100m")
memory_limit = optional(string, "128Mi")
cpu_request = optional(string, "10m")
memory_request = optional(string, "32Mi")
custom_alerts = optional(string, "")
alert_notification_settings = optional(string, "")
})
})
{
"alertmanager_config": {
"name": "alertmanager"
},
"blackbox_exporter_config": {
"monitoring_targets": [],
"name": "blackbox-exporter"
},
"enable_kube_state_metrics": true,
"enable_node_exporter": true,
"grafana_config": {
"admin_user": "admin",
"ingress_enabled": false,
"lb_visibility": "internet-facing",
"prometheus_endpoint": "prometheus"
},
"k8s_namespace": {
"create": true,
"name": "metrics"
},
"log_level": "info",
"replica_count": 1,
"resources": {
"cpu_limit": "100m",
"cpu_request": "100m",
"memory_limit": "512Mi",
"memory_request": "128Mi"
},
"retention_period": "15d",
"storage": "8Gi"
}
no
search_engine (optional) Search engine for logs string null no
tags (optional) Tags for AWS resources map(string) {} no

Outputs

Name Description
grafana_lb_arn Grafana ingress loadbalancer ARN
kibana_lb_arn Kibana ingress loadbalancer ARN

Development

Prerequisites

Configurations

  • Configure pre-commit hooks
    pre-commit install
    
  • Configure golang deps for tests
    go get github.com/gruntwork-io/terratest/modules/terraform
    go get github.com/stretchr/testify/assert
    

Git commits

while Contributing or doing git commit please specify the breaking change in your commit message whether its major,minor or patch

For Example

git commit -m "your commit message #major"
By specifying this , it will bump the version and if you dont specify this in your commit message then by default it will consider patch and will bump that accordingly

Tests

  • Tests are available in test directory
  • In the test directory, run the below command
    go test -timeout 1800s
    

Authors

This project is authored by: - SourceFuse