In this blog we will discuss the Pod disruption budgetfeature of Kubernetes. It is a recent addition to Kubernetes that is very useful, but it can lead to get your Kubernetes to get overprotective of its Pods resulting in some issues, as we experienced with one of our customers. They are an essential part of Kubernetes. Kubernetes itself has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available. Google open-sourced the Kubernetes project in 2014.
Also funny to mention is that the name Kubernetes originates from Greek, meaning helmsman or pilot. K8s as an abbreviation, results from counting the eight letters between the “K” and the “s”.
Let me first explain pods and the disruption budget feature: Kubernetes itself is a distributed system which means that the configurations in it are distributed as well. To be able to carry these distributions there are Pods. A Pod contains one or more containers, such as Docker containers. When a Pod runs multiple containers, the containers are managed as a single entity and share the Pod’s resources. Pods are the smallest, most basic deployable objects in Kubernetes. A Pod represents a single instance of a running process in your cluster. Moreover, Pods are ephemeral. Which means they are not designed to run forever, and when a Pod is terminated it cannot be brought back. In general, Pods do not disappear until they are deleted by a user or by a controller. Furthermore, Pods do not heal or repair themselves. Generally, running multiple containers in a single Pod is an advanced use case. In shorter words, a Pod is a collection of containers, configurations, network gates and more. Good to know is that a Pod itself won’t store anything. Which means you have to find other ways to store your data. One way to do this is by attaching storage, known as a volume claim to the Pod.
A feature that is added to Pods, is as a disruption budget. This enables you to state how you want your cluster to handle certain aspects within the environment. You can edit these Pod disruption budgets, to tell them what to do by applying the code: ‘’kubectl get poddisruptionbudgets’’. In turn, within these Pods you can run programs multiple times in order to make sure you have back-ups. For instance, if you have 3 Pods and one goes down it wouldn’t be a problem because you have the remaining two left as a back-up. To make the running programs even stronger you could add a Pod disruption button. With this button you can configure to keep half of the Pods available running while you switch the remaining Pods ‘’off’’. But you have to be aware that this might have unintended implications on the infrastructure, if for example in such a case you configured the Pod disruption button to always keep half of the Pods running but you only have 1 Pod. The program can’t divide that 1 Pod into 2, which will result in that it will refuse to switch off.
This is exactly what we experienced with one of our clients. The Pod wouldn’t turn off to prevent the cluster from breaking anything. That meant that we couldn’t resolve the issue we were working on for that client. After some short investigation we found that the pod disruption caused this problem so we were able to solve this issue by making sure that all the disruption budgets were set properly by configuring the pods in such a manner, that it would allow us to turn it off. Once we were sure everything was set properly, we put the cluster in maintenance and erased the containers. As all the data was already stored in the network and the configurations in the cluster this was easily done. After restarting the cluster and checking on it, we saw the original issue was resolved. The cluster was fully out of maintenance and the customer was able to work in the environment again.
Have you experienced anything like this, or do you need help with such a case? Let us know in the comments.