Bootstrapping a production ready Kubernetes on Hetzner Cloud

6 tips for creating a production ready Kubernetes cluster using KubeOne with Canal CNI on Hetzner Cloud

Introduction

Hi and welcome to my first blog post on my new website.

I know what you are thinking right now “Oh no! Not another blogpost about setting up a Kubernetes Cluster!”. And yeah, I get it! There are a lot of blog-posts, tutorials, and articles already written about this topic. For example shibumi wrote an amazing blog-post about Kubernetes on Hetzner in 2021, and there is a even a Hetzner example terraform in the KubeOne GitHub.

But here is the deal: while those posts and examples give you an easy quick-start, they don’t cover the aspects of bootstrapping a KubeOne cluster that is supposed to run in production some day.

To scope this blog-post a bit down, I have to make a few assumptions:

  • You already have KubeOne installed on your local machine
  • You have a project on Hetzner Cloud
  • You have already created a Hetzner API Token for your Project with read+write permissions, or you are able to do so
  • You want to use Canal as your CNI of choice and not Cilium which was recently added in KubeOne 1.4.
  • You are familiar with terraform

Preface and Acknowledgements

This is not a “definitive guide” nor should you take everything you read too serious. I might be wrong or too opinionated about stuff.

Actually running Kubernetes in production is way harder than just reading this article. I intentionally leave out many details of how to actually run Kubernetes in production.

Specifically, in this post I will not talk about:

  • Security
  • GitOps
  • Disaster recovery
  • Monitoring here

I might dedicate future blog-posts to those topics (and hopefully remember to link them back here).

Therefore, the scope of this blog post is narrowed down to bootstrapping a Kubernetes Cluster using KubeOne - with somewhat sane defaults and measures taken - that will get you started on your journey to a production ready Kubernetes Cluster.

Reliability Tip 1: Use an odd number of API servers

To get started we need to have some virtual servers running on Hetzner Cloud to install the Kubernetes API Server, etcd database and the cluster’s control-plane on. You should stick to odd numbers of your API servers because etcd needs a majority of nodes to agree on updates to the cluster state.

This majority (quorum) required for etcd is (n/2)+1 1.

The etcd FAQ page describes it:

For any odd-sized cluster, adding one node will always increase the number of nodes necessary for quorum. Although adding a node to an odd-sized cluster appears better since there are more machines, the fault tolerance is worse since exactly the same number of nodes may fail without losing quorum but there are more nodes that can fail. If the cluster is in a state where it can’t tolerate any more failures, adding a node before removing nodes is dangerous because if the new node fails to register with the cluster (e.g., the address is misconfigured), quorum will be permanently lost.

Reliability Tip 2: Don’t use the count meta-argument of terraform

Fortunate for us, KubeOne comes with a great set of examples for using terraform to set up your infrastructure. We will use the hetzner example as a base and customize it a bit. It comes with mostly sane defaults and best practices out of the box, including a firewall and a placement group for our control-plane nodes to ensure higher reliability.

But there is a problem with this example: The usage of the count meta-argument for the hcloud_server definition.

At first this seems absolutely valid and an easy fix for avoiding duplicate code. But the devil lies within the details as you’re about to find out.

Let’s say we want to change the location of our servers, update the base images of our servers, or add/remove an SSH key. All those changes got something in common: They will absolutely destroy and re-create the server.

But because we are using the count meta-argument, we cannot update (re-create) only one server at a time. We can only replace all servers simultanously.

KubeOne deploys the etcd database on the API servers, which means if we loose all three API server nodes at the same time, we will loose all three etcd database replicas as well. And if we loose etcd, we loose our entire cluster.

Therefore, we need to replace everything with a count meta-argument with an explicit object. Yes, this causes code duplication. But it allows us to upgrade one server at a time. And after every server update, we can re-run kubeone to repair (or reconcile) our cluster. By doing so, we perform a rolling update of our control plane without loosing any data.

Now, lets get to it. We have to change the code blocks in lines 95-99, lines 110-126, and lines 144-154.

main.tf before

We have to remove the count meta-argument, get rid of the element(..., count.index) syntax and replace everything with actual references to explicit objects, so it looks like this:

main.tf after

As pointed out by EarthlingDavey in #20 this also has some implications for the output.tf, requiring us to remove the count meta-argument there as well.

We must up first fix the ssh_command ressource from lines 26-28 and also the kubeone_hosts from line 38-47

output.tf before
output.tf after

Reliability Tip 3: Use terraform remote backends

I’m almost certain, every single one of you ran terraform apply at least once on their local machine. I mean, after all, that is how terraform is supposed to be used, right?

And the sad truth is, I’ve seen a lot of production environments that where built exactly like that: Someone ran terraform apply on their local machine. And hey, now we can advertise our infrastructure as “infrastructure as code”. Technically this migt be correct but it is certainly not what you would expect.

To us “DevOps” or “SRE” folks it is obvious to run terraform from a CI/CD pipeline.

GitLab released GitLab managed Terraform state a while back 2, adding a feature allowing you to securely store your tfstate within GitLab.

But there is still a problem with this approach: Many don’t use GitLab. I use GitHub for the overwhelming majority of my work. And if the pipeline executes terraform for me, I can’t easily run terraform plan on my local machine to validate changes before pushing them. Or at least not without manually downloading the tfstate first.

But there is a solution that works from any CI/CD platform as well as the CLI, regardless of platform level integrations that allows for safe storage of the tfstate.

And that’s where the terraform remote backend comes in to play.

Backends in terraform defines where the state snapshots are stored.

This particular backend uses the terraform cloud to actually run terraform for you.

Remote backends give you the greatest level of flexibility and ease as it’s possible to use terraform from any (or even multiple) CI/CD pipeline platform(s) and even your local machines without worrying about keeping tfstates, and variables in sync. All Variables (and for that matter secrets as well) are stored on the terraform cloud.

You can find a tutorial on how to set up the remote backend here.

Reliability Tip 4: Configure your KubeOne.yaml correctly

If you’re just starting with KubeOne, you might find a KubeOne.yaml that looks like this:

a basic kubeone.yaml

But unfortunately - on the KubeOne documentation website - there isn’t a great deal of information available on how to configure your KubeOne cluster in more depth.

But thankfully, the kubeone-cli comes with a nifty command:

kubeone config print --full

This will show you all available config options with the defaults used. KubeOne does a good job with those defaults and not much configuration is needed.

I want to deploy the “cluster-autoscaler” addon which comes right out of the box with KubeOne and is particularly useful for production-ready clusters.

I also ensure I set the MTU for canal correctly, as things tend to get a bit icky if the MTU is wrong.

final kubeone.yaml

Reliability Tip 5: Configure Canal to not listen on the public network interface

Well, technically not canal itself but flanel which is a part of canal. Remember: canal is just a combination of calico and flanel 3

The flanel backend which is shipped as part of your canal CNI installation is by default binding on your internet facing eth0 port4. This is absolutely not what you want!

I was made aware of the problem by this GitHub issue: hetznercloud/csi-driver#204

Is it possible that your Kubernetes nodes use their public IPs (and interfaces) instead of a private network for communication between the nodes? – Max Rosin (@ekeih)

and a solution for the problem was pointed out in this comment:

In my setup I use -iface-regex=10.0.. in flannel daemonset – Evgeniy Gurinovich (@jekakm)

The fix can be applied relatively easy via kubectl patch:

$ kubectl patch daemonset --namespace kube-system canal --type=json -p='[{"op": "add", "path": "/spec/template/spec/containers/1/command/-", "value": "-iface-regex=10\\.0\\.*\\.*"}]'
daemonset.apps/canal patched

(It is totally possible to add a step to your CI/CD pipeline that applies the mitigation for you, after you created (or reconciled) your KubeOne cluster)

Reliability Tip 6: Deploy your worker nodes programatically

KubeOne comes with a machine-controller that can programmatically deploy worker-nodes to hetzner online using the cluster-api5.

KubeOne automagically creates a default machine-deployment for you.

It’s a good start, but we can do better. Honestly, I just wouldn’t bother modifying the existing machine deployment and just get rid of it:

$ kubectl --namespace kube-system delete machinedeployment prod-ready-pool1
machinedeployments.cluster.k8s.io/prod-ready-pool1 deleted

Better create a new machinedeployment.yaml which you should add to your Git-Repo to keep your config in sync.

When we create our worker-nodes, we want to ensure:

  • To set annotations for cluster-autoscaler correctly to dynamically scale our worker-nodes
  • Deploy our SSH keys to the worker-nodes, allowing easier troubleshooting
  • Our worker-nodes are placed in our virtual network
  • Labels are added to our worker-nodes, so the hetzner firewall can filter traffic to the worker-nodes as well

In order for the machinedeployment to work correctly, we therefore need to know a few variables:

  • The min and max count of worker-nodes
  • The cluster-name which is added as a label
  • The network-id to place the worker-nodes in the correct virtual network
  • The cluster-version as defined in our kubeone.yaml
  • The datacenter location (ideally the same as the API servers)

Luckily terraform already provides us all information and we can obtain the terraform output in JSON format.

$ terraform output -json > output.json

And now we can determine most of the variables by using a little jq magic:

export AUTOSCALER_MIN=1
export AUTOSCALER_MAX=3
export NETWORK_ID=`jq -r '.kubeone_hosts.value.control_plane.network_id' output.json`
export CLUSTER_NAME=`jq -r '.kubeone_hosts.value.control_plane.cluster_name' output.json`
export CLUSTER_VERSION=`yq e -j < kubeone.yaml | jq -r '.versions.kubernetes'`
export DATACENTER_LOCATION=`jq -r '.control_plane_info.value.location' output.json`

Finally, we can use a template of our machinedeployment and make use of envsubst to render our template6:

machinedeployment.yaml.tpl

envsubst < ./machinedeployment.yaml.tpl > ./machinedeployment.yaml

[40]:

Cedric Kienzler
Senior Software Engineer - Azure Resiliency Engineering

My work primarily focuses on building, designing, and maintaining highly distributed systems at large scale

Related