Assigning EKS Namespaces to Node Groups



In AWS EKS clusters, there are a couple of use cases for which all pods of a namespace should be automatically scheduled to specific nodes in Kubernetes, including:

  • Clear allocation of data plane infrastructure (and costs) to teams in large organizations,
  • Running critical workloads on on-demand nodes and not on spot nodes, or
  • Using specific hardware, such as GPU, only by workloads that actually require it.

In this post, we will explore how to facilitate that in EKS.

The basic (and probably best) solution

There are several approaches to easily solve the problem at the deployment or pod level:

  • NodeSelector
  • Taints and Tolerations
  • Affinity and Anti-Affinity

However, all of the aforementioned ways require that each application’s Kubernetes manifests include the appropriate configuration, which can become effortsome in some cases. Therefore, there may be a desire to have Kubernetes do the mapping based on the namespace level and automatically adjust the applications’ manifests. We will not go into the details of the above techniques in this post, but instead look into some advanced approaches to solve the problem.

More elaborate solutions

In principle, solving the problem is possible with the PodTolerationRestriction Admission Controller in kubernetes. Unlike the PodNodeSelector, this is mutating and not just validating, meaning it can change configurations to add a toleration for a tainted node to the pod.

Unfortunately, both Admission Controllers are unfortunately not supported in EKS. At the time of publishing this post, there are open Github issues (here and here) on the subject. So we need to find another solution.

The MutatingAdmissionWebhook Admission Controller is supported in EKS, i.e. a custom implementation of the desired logic is principally possible. This is especially helpful if the desired logic might be getting more complex than we assume in this blog post. Implementing this does however require a bit of work for something than in fact, plain kubernetes supports out of the box.

Alternatively, there is also an open source project which implements a webhook for what we need. It doesn’t seem to be actively maintained anymore though, judging from the latest commit date from December 2021.

Tool-based solutions

Another option is to implement the manifest changes as part of a CI/CD pipeline, e.g. based on Github Action, Gitlab Runners or AWS CodePipeline. Some deployment tools such as ArgoCD might also support such manifest changes (ArgoCD Overrides).

Conclusion

As we have seen, there is no simple, out-of-the-box solution for the problem, which is particularly frustrating due to the fact the plain kubernetes does have such solution. It largely depends on the specific use case as well as the ecosystem tooling what approach is best. Whatever option is chosen, one should not forget about ongoing maintenance efforts for everything that is custom-built, and thus, the “manual” way of changing individual applications’ manifest files is probably the best in many cases.


Title Photo by Stéphan Valentin on Unsplash

Similar Posts You Might Enjoy

EKS Backup with Velero

Velero is a tool to backup the kubernetes cluster state and its persistent volumes. It can be used for disaster recovery or cluster migration. Please refer to the official documentation for a more comprehensive description of use cases. This article describes the baseline setup for the backup to ease the start of backing up your EKS clusters. - by Benjamin Wagner

Scaling Down EKS Clusters at night

Scaling down workloads at night or at the weekends is a common implementation task for companies building on AWS. By running only the applications that need to be available at any point in time, the total consumption of infrastructure resources can be reduced, and thus customers can benefit from the pay-by-use pricing models of cloud providers. - by Benjamin Wagner

Using AWS Security Hub for EKS Security

kube-bench is a tool for checking kubernetes clusters against requirements defined in the CIS Benchmark. The tool runs locally on a kubernetes node, performs its checks and prompts the outputs to the shell or to files. This is quite unhandy, because it means that a user needs to pick up the logs, store them somewhere and analyze them. A deployment of the tool via kubernetes can ease the process for example with the kubectl logs command, but it is still far from perfect. Luckily, there is an integration in AWS Security Hub. - by Benjamin Wagner