Best practices for consistent configuration management at scale with Tanka

• 2021-07-12 • 6 min

At Grafana Labs, we use Tanka to deploy workloads to our Kubernetes clusters. As our organization grew, we asked ourselves: How should we manage workload configuration at scale, and in a consistent way?

A bit of context

In the beginning, engineers were manually invoking Tanka, applying the configurations from their local machines. Because Tanka was built to tightly couple a Tanka environment to a single Kubernetes cluster and set a default namespace, they would often find that the local Kube context did not match the cluster they wanted to apply to.

Since then, our engineering team has grown rapidly, the number of Kubernetes clusters has increased, and the quantity of Tanka environments has exploded. This put a heavy burden on the engineers, who often had to apply to as many Tanka environments as there were clusters, figuring out drift as people make mistakes and often forget one environment or another. To combat this, we naturally implemented Continuous Deployment. This resolved many of our issues with existing clusters and environments.

As Grafana Labs keeps on growing, the platform for our engineers also needs to grow. This means more Kubernetes clusters and even more Tanka environments. Do you already see the problem here? It’s a neverending story: Managing all of these Tanka environments becomes a real hassle. Application environments for different clusters start to drift from each other as some clusters need a slightly different configuration than others. With new clusters being added or migrated often, engineers need to manually bootstrap a new environment, manually couple it to a new API server, and create the namespace(s) and reconsider the cluster specific exceptions.

Bringing Tanka environments in line

The above problems all have a potential impact on the business. For example, drift between clusters makes debugging harder during an incident, missing namespaces can block the CD process, and bootstrapping increases engineering cost.

Let’s boil this down to three problems:

Configuration drift for the same application between different clusters.
Bootstrapping new Tanka environments in new clusters.
Bootstrapping new clusters (creating namespaces, for example).

We solved these problems in Jsonnet with Tanka inline environments at the base. Inline environments allow us to dynamically create environments with all the functions and constructs that Jsonnet has to offer. An inline environment is nothing more than the spec.json with a data: field that contains the Kubernetes resources. With the tanka-util library, we can quickly configure a new Tanka environment:

// environments/grafana/main.jsonnet
local grafana = import 'github.com/grafana/jsonnet-libs/grafana/grafana.libsonnet';
local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';

{
  local this = self,

  data:: {
    grafana:
      grafana
      + grafana.withAnonymous(),
  },

  env:
    tanka.environment.new(
      name='grafana/' + cluster.name,
      namespace='grafana',
      apiserver='https://127.0.1.1:6443',
    )
    + tanka.environment.withLabels({ cluster: cluster.name })
    + tanka.environment.withData(this.data),
}

To add this to a new cluster, we can make a copy of env: with the same data block, giving us a bit more consistency already. However, that is still manual labor. Let us take it a step further and also describe our clusters in Jsonnet:

// lib/meta/meta.libsonnet
{
  clusters: {
    'dev-01': {
      name: 'dev-01',
      status: 'dev',
      apiserver: 'https://127.0.1.1:6443',
    },
    'prod-01': {
      name: 'prod-01',
      status: 'prod',
      apiserver: 'https://127.0.2.1:6443',
    },
  },
}

This will allow us to create consistent deployment of Grafana in both the development and production cluster with just a few extra lines of code:

 // environments/grafana/main.jsonnet
 local grafana = import 'github.com/grafana/jsonnet-libs/grafana/grafana.libsonnet';
 local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';
+local meta = import 'meta/meta.libsonnet';

 {
   local this = self,

   data:: {
     grafana:
       grafana
       + grafana.withAnonymous(),
   },

-   env:
+   env(cluster)::
     tanka.environment.new(
       name='grafana/' + cluster.name,
       namespace='grafana',
-      apiserver='https://127.0.1.1:6443',
+      apiserver=cluster.apiserver,
     )
     + tanka.environment.withLabels({ cluster: cluster.name })
     + tanka.environment.withData(this.data),

+  envs: {
+    [name]: this.env(meta.clusters[name])
+    for name in std.objectFields(meta.clusters)
+  },
 }

This prevents problem 1, configuration drift between clusters.

Let’s add Grafana to another cluster:

 // lib/meta/meta.libsonnet
 {
   clusters: {
     'dev-01': {
       name: 'dev-01',
       status: 'dev',
       apiserver: 'https://127.0.1.1:6443',
     },
+    'dev-02': {
+      name: 'dev-02',
+      status: 'dev',
+      apiserver: 'https://127.0.1.2:6443',
+    },
     'prod-01': {
       name: 'prod-01',
       status: 'prod',
       apiserver: 'https://127.0.2.1:6443',
     },
   },
 }

That’s it! No more changes are required to the Grafana application configuration or Tanka environment. This mitigates problem 2, bootstrapping new Tanka environments.

Note: At Grafana Labs, we create our Kubernetes clusters with Terraform. The list of clusters in lib/meta is therefore generated from Terraform, reducing the manual burden even more.

I hear you: “But but but… I have one cluster that is different from the rest; that ‘drift’ was intentional!” Cluster-specific overrides can simply be done by extending the envs object:

 // environments/grafana/main.jsonnet
 local grafana = import 'github.com/grafana/jsonnet-libs/grafana/grafana.libsonnet';
 local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';
 local meta = import 'meta/meta.libsonnet';

 {
   local this = self,

   data:: {
     grafana:
       grafana
       + grafana.withAnonymous(),
   },

   env(cluster)::
     tanka.environment.new(
       name='grafana/' + cluster.name,
       namespace='grafana',
       apiserver=cluster.apiserver,
     )
     + tanka.environment.withLabels({ cluster: cluster.name })
     + tanka.environment.withData(this.data),

   envs: {
     [name]: this.env(meta.clusters[name])
     for name in std.objectFields(meta.clusters)
+  } + {
+    'prod-01'+: {
+      data+: { grafana+: grafana.withTheme('dark') },
+    },
   },
 }

A space for namespaces

Traditionally, namespaces are either created manually, or within the Tanka environment. If a namespace was created in the Tanka environment, there is a risk that this Tanka environment is not the only environment with resources in this namespace. The removal of the namespace might destroy all resources in that namespace.

Luckily we have all the Tanka environments available through tk env list, and we can simply generate the namespace manifests based on the Tanka environment specifications. Let’s generate a JSON data file so we can use it in lib/meta:

tk env list --json environments/ | jq . > lib/meta/raw/environments.json

Have a look at lib/meta/raw/environments.json (trimmed), and note the spec.namespace value:

[
  {
    "apiVersion": "tanka.dev/v1alpha1",
    "kind": "Environment",
    "metadata": {
      "name": "grafana",
      "namespace": "environments/grafana/main.jsonnet",
      "labels": { "cluster": "dev-01" }
    },
    "spec": {
      "apiServer": "https://127.0.1.1:6443",
      "namespace": "grafana"
    }
  },
  {
    "apiVersion": "tanka.dev/v1alpha1",
    "kind": "Environment",
    "metadata": {
      "name": "grafana",
      "namespace": "environments/grafana/main.jsonnet",
      "labels": { "cluster": "dev-02" }
    },
    "spec": {
      "apiServer": "https://127.0.1.2:6443",
      "namespace": "grafana"
    }
  },
  {
    "apiVersion": "tanka.dev/v1alpha1",
    "kind": "Environment",
    "metadata": {
      "name": "grafana",
      "namespace": "environments/grafana/main.jsonnet",
      "labels": { "cluster": "prod-01" }
    },
    "spec": {
      "apiServer": "https://127.0.2.1:6443",
      "namespace": "grafana"
    }
  }
]

From that, we can generate a list of namespace/cluster pairs:

// lib/meta/meta.libsonnet
local envs = import './raw/environments.json';
{
  clusters: {/* ... */ },

  namespaces:
    std.foldr(
      function(env, k) k + (
        {
          [env.spec.namespace]+: {
            clusters+: [env.metadata.labels.cluster],
          },
        }
      ),
      envs,
      {}
    ),
}

And eventually we can generate the namespace manifests in a Tanka inline environment, similar to the Grafana environments:

//environments/cluster-resources/main.jsonnet
local k = import 'github.com/grafana/jsonnet-libs/ksonnet-util/kausal.libsonnet';
local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';
local meta = import 'meta/meta.libsonnet';

{
  local this = self,

  data(cluster):: {
    namespaces: {
      [ns]: k.core.v1.namespace.new(ns)
      for ns in std.objectFields(meta.namespaces)
      if std.length(std.find(cluster.name, meta.namespaces[ns].clusters)) > 0
    },
  },

  env(cluster)::
    tanka.environment.new(
      name='cluster-resources/' + cluster.name,
      namespace='namespace',
      apiserver=cluster.apiserver,
    )
    + tanka.environment.withLabels({ cluster: cluster.name })
    + tanka.environment.withData(this.data(cluster)),

  envs: {
    [name]: this.env(meta.clusters[name])
    for name in std.objectFields(meta.clusters)
  },
}

Now, if we create a Tanka environment with a new namespace, we update lib/meta/raw/environments.json, and a new namespace will get created. As long as there are Tanka environments with the grafana namespace in that cluster, the namespace manifest will be here. Finally, a CI check verifies that the tk env list --json matches the raw file.

Personal note

The biggest mental shift for me was to consider our Jsonnet code base as a “database” instead of just infrastructure-as-code. This “database” contains very valuable information that allows us to use this data over and over again in different situations (“views”), reducing the need to duplicate the data.

Feedback

Relevant sources:

Feedback

Best practices for consistent configuration management at scale with Tanka

A bit of context

Bringing Tanka environments in line

A space for namespaces

Personal note

Related content

Best practices for consistent configuration management at scale with Tanka

A bit of context

Bringing Tanka environments in line

A space for namespaces

Personal note

Related content

Demystifying the OpenTelemetry Operator: Observing Kubernetes applications without writing code

Creating alerts from panels in Kubernetes Monitoring: an overlooked, powerhouse feature

Monitoring Kubernetes: Why traditional techniques aren't enough