Best practices for consistent configuration management at scale with Tanka
At Grafana Labs, we use Tanka to deploy workloads to our Kubernetes clusters. As our organization grew, we asked ourselves: How should we manage workload configuration at scale, and in a consistent way?
A bit of context
In the beginning, engineers were manually invoking Tanka, applying the configurations from their local machines. Because Tanka was built to tightly couple a Tanka environment to a single Kubernetes cluster and set a default namespace, they would often find that the local Kube context did not match the cluster they wanted to apply to.
Since then, our engineering team has grown rapidly, the number of Kubernetes clusters has increased, and the quantity of Tanka environments has exploded. This put a heavy burden on the engineers, who often had to apply to as many Tanka environments as there were clusters, figuring out drift as people make mistakes and often forget one environment or another. To combat this, we naturally implemented Continuous Deployment. This resolved many of our issues with existing clusters and environments.
As Grafana Labs keeps on growing, the platform for our engineers also needs to grow. This means more Kubernetes clusters and even more Tanka environments. Do you already see the problem here? It’s a neverending story: Managing all of these Tanka environments becomes a real hassle. Application environments for different clusters start to drift from each other as some clusters need a slightly different configuration than others. With new clusters being added or migrated often, engineers need to manually bootstrap a new environment, manually couple it to a new API server, and create the namespace(s) and reconsider the cluster specific exceptions.
Bringing Tanka environments in line
The above problems all have a potential impact on the business. For example, drift between clusters makes debugging harder during an incident, missing namespaces can block the CD process, and bootstrapping increases engineering cost.
Let’s boil this down to three problems:
- Configuration drift for the same application between different clusters.
- Bootstrapping new Tanka environments in new clusters.
- Bootstrapping new clusters (creating namespaces, for example).
We solved these problems in Jsonnet with Tanka inline environments at the base. Inline environments allow us to dynamically create environments with all the functions and constructs that Jsonnet has to offer. An inline environment is nothing more than the spec.json
with a data:
field that contains the Kubernetes resources. With the tanka-util
library, we can quickly configure a new Tanka environment:
// environments/grafana/main.jsonnet
local grafana = import 'github.com/grafana/jsonnet-libs/grafana/grafana.libsonnet';
local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';
{
local this = self,
data:: {
grafana:
grafana
+ grafana.withAnonymous(),
},
env:
tanka.environment.new(
name='grafana/' + cluster.name,
namespace='grafana',
apiserver='https://127.0.1.1:6443',
)
+ tanka.environment.withLabels({ cluster: cluster.name })
+ tanka.environment.withData(this.data),
}
To add this to a new cluster, we can make a copy of env:
with the same data
block, giving us a bit more consistency already. However, that is still manual labor.
Let us take it a step further and also describe our clusters in Jsonnet:
// lib/meta/meta.libsonnet
{
clusters: {
'dev-01': {
name: 'dev-01',
status: 'dev',
apiserver: 'https://127.0.1.1:6443',
},
'prod-01': {
name: 'prod-01',
status: 'prod',
apiserver: 'https://127.0.2.1:6443',
},
},
}
This will allow us to create consistent deployment of Grafana in both the development and production cluster with just a few extra lines of code:
// environments/grafana/main.jsonnet
local grafana = import 'github.com/grafana/jsonnet-libs/grafana/grafana.libsonnet';
local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';
+local meta = import 'meta/meta.libsonnet';
{
local this = self,
data:: {
grafana:
grafana
+ grafana.withAnonymous(),
},
- env:
+ env(cluster)::
tanka.environment.new(
name='grafana/' + cluster.name,
namespace='grafana',
- apiserver='https://127.0.1.1:6443',
+ apiserver=cluster.apiserver,
)
+ tanka.environment.withLabels({ cluster: cluster.name })
+ tanka.environment.withData(this.data),
+ envs: {
+ [name]: this.env(meta.clusters[name])
+ for name in std.objectFields(meta.clusters)
+ },
}
This prevents problem 1, configuration drift between clusters.
Let’s add Grafana to another cluster:
// lib/meta/meta.libsonnet
{
clusters: {
'dev-01': {
name: 'dev-01',
status: 'dev',
apiserver: 'https://127.0.1.1:6443',
},
+ 'dev-02': {
+ name: 'dev-02',
+ status: 'dev',
+ apiserver: 'https://127.0.1.2:6443',
+ },
'prod-01': {
name: 'prod-01',
status: 'prod',
apiserver: 'https://127.0.2.1:6443',
},
},
}
That’s it! No more changes are required to the Grafana application configuration or Tanka environment. This mitigates problem 2, bootstrapping new Tanka environments.
Note: At Grafana Labs, we create our Kubernetes clusters with Terraform. The list of clusters in
lib/meta
is therefore generated from Terraform, reducing the manual burden even more.
I hear you: “But but but… I have one cluster that is different from the rest; that ‘drift’ was intentional!” Cluster-specific overrides can simply be done by extending the envs
object:
// environments/grafana/main.jsonnet
local grafana = import 'github.com/grafana/jsonnet-libs/grafana/grafana.libsonnet';
local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';
local meta = import 'meta/meta.libsonnet';
{
local this = self,
data:: {
grafana:
grafana
+ grafana.withAnonymous(),
},
env(cluster)::
tanka.environment.new(
name='grafana/' + cluster.name,
namespace='grafana',
apiserver=cluster.apiserver,
)
+ tanka.environment.withLabels({ cluster: cluster.name })
+ tanka.environment.withData(this.data),
envs: {
[name]: this.env(meta.clusters[name])
for name in std.objectFields(meta.clusters)
+ } + {
+ 'prod-01'+: {
+ data+: { grafana+: grafana.withTheme('dark') },
+ },
},
}
A space for namespaces
Traditionally, namespaces are either created manually, or within the Tanka environment. If a namespace was created in the Tanka environment, there is a risk that this Tanka environment is not the only environment with resources in this namespace. The removal of the namespace might destroy all resources in that namespace.
Luckily we have all the Tanka environments available through tk env list
, and we can
simply generate the namespace manifests based on the Tanka environment specifications. Let’s generate a JSON data file so we can use it in lib/meta
:
tk env list --json environments/ | jq . > lib/meta/raw/environments.json
Have a look at lib/meta/raw/environments.json
(trimmed), and note the spec.namespace
value:
[
{
"apiVersion": "tanka.dev/v1alpha1",
"kind": "Environment",
"metadata": {
"name": "grafana",
"namespace": "environments/grafana/main.jsonnet",
"labels": { "cluster": "dev-01" }
},
"spec": {
"apiServer": "https://127.0.1.1:6443",
"namespace": "grafana"
}
},
{
"apiVersion": "tanka.dev/v1alpha1",
"kind": "Environment",
"metadata": {
"name": "grafana",
"namespace": "environments/grafana/main.jsonnet",
"labels": { "cluster": "dev-02" }
},
"spec": {
"apiServer": "https://127.0.1.2:6443",
"namespace": "grafana"
}
},
{
"apiVersion": "tanka.dev/v1alpha1",
"kind": "Environment",
"metadata": {
"name": "grafana",
"namespace": "environments/grafana/main.jsonnet",
"labels": { "cluster": "prod-01" }
},
"spec": {
"apiServer": "https://127.0.2.1:6443",
"namespace": "grafana"
}
}
]
From that, we can generate a list of namespace/cluster pairs:
// lib/meta/meta.libsonnet
local envs = import './raw/environments.json';
{
clusters: {/* ... */ },
namespaces:
std.foldr(
function(env, k) k + (
{
[env.spec.namespace]+: {
clusters+: [env.metadata.labels.cluster],
},
}
),
envs,
{}
),
}
And eventually we can generate the namespace manifests in a Tanka inline environment, similar to the Grafana environments:
//environments/cluster-resources/main.jsonnet
local k = import 'github.com/grafana/jsonnet-libs/ksonnet-util/kausal.libsonnet';
local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';
local meta = import 'meta/meta.libsonnet';
{
local this = self,
data(cluster):: {
namespaces: {
[ns]: k.core.v1.namespace.new(ns)
for ns in std.objectFields(meta.namespaces)
if std.length(std.find(cluster.name, meta.namespaces[ns].clusters)) > 0
},
},
env(cluster)::
tanka.environment.new(
name='cluster-resources/' + cluster.name,
namespace='namespace',
apiserver=cluster.apiserver,
)
+ tanka.environment.withLabels({ cluster: cluster.name })
+ tanka.environment.withData(this.data(cluster)),
envs: {
[name]: this.env(meta.clusters[name])
for name in std.objectFields(meta.clusters)
},
}
Now, if we create a Tanka environment with a new namespace, we update
lib/meta/raw/environments.json
, and a new namespace will get created. As long as there are Tanka environments with the grafana
namespace in that cluster, the namespace manifest will be here. Finally, a CI check verifies that the tk env list --json
matches the raw file.
Personal note
The biggest mental shift for me was to consider our Jsonnet code base as a “database” instead of just infrastructure-as-code. This “database” contains very valuable information that allows us to use this data over and over again in different situations (“views”), reducing the need to duplicate the data.