Taints and tolerations are a Kubernetes mechanism for controlling how Pods schedule to the Nodes in your cluster. Taints are applied to Nodes and act as a repelling barrier against new Pods. Tainted Nodes will only accept Pods that have been marked with a corresponding toleration.
Taints are one of the more advanced Kubernetes scheduling mechanisms. They facilitate many different use cases where you want to prevent Pods ending up on undesirable Nodes. In this article, you’ll learn what taints and tolerations are and how you can utilize them in your own cluster.
How Scheduling Works
Kubernetes is a distributed system where you can deploy containerized applications (Pods) across multiple physical hosts (Nodes). When you create a new Pod, Kubernetes needs to determine the set of Nodes it can be placed on. This is what scheduling refers to.
The scheduler considers many different factors to establish a suitable placement for each Pod. It’ll default to selecting a Node that can provide sufficient resources to satisfy the Pod’s CPU and memory requests.
The selected Node won’t necessarily be appropriate for your deployment though. It could lack required hardware or be reserved for development use. Node taints are a mechanism for enforcing these constraints by preventing arbitrary assignation of Pods to Nodes.
Taint Use Cases
Tainting a Node means it will start to repel Pods, forcing the scheduler to consider the next candidate Node instead. You can overcome the taint by setting a matching toleration on the Pod. This provides a mechanism for allowing specific Pods onto the Node.
Taints are often used to keep Pods away from Nodes that are reserved for specific purposes. Some Kubernetes clusters might host several environments, such as staging and production. In this situation you’ll want to prevent staging deployments from ending up on the dedicated production hardware.
You can achieve the desired behavior by tainting the production Node and setting a matching toleration on production Pods. Staging Pods will be confined to the other Nodes in your cluster, preventing them from consuming production resources.
Taints can also help distinguish between Nodes with particular hardware. Operators might deploy a subset of Nodes with dedicated GPUs for use with AI workloads. Tainting these Nodes ensures Pods that don’t need the GPU can’t schedule onto them.
Each Node taint can have one of three different effects on Kubernetes scheduling decisions:
NoSchedule– Pods that lack a toleration for the taint won’t be scheduled onto the Node. Pods already scheduled to the Node aren’t affected, even if they don’t tolerate the taint.
PreferNoSchedule– Kubernetes will avoid scheduling Pods without the taint’s toleration. The Pod could still be scheduled to the Node as a last resort option. This does not affect existing Pods.
NoExecute– This functions similarly to
NoScheduleexcept that existing Pods are impacted too. Pods without the toleration will be immediately evicted from the Node, causing them to be rescheduled onto other Nodes in your cluster.
NoExecute effect is useful when you’re changing the role of a Node that’s already running some workloads.
NoSchedule is more appropriate if you want to guard the Node against receiving new Pods, without disrupting existing deployments.
Tainting a Node
Taints are applied to Nodes using the
kubectl taint command. It takes the name of the target Node, a key and value for the taint, and an effect.
Here’s an example of tainting a Node to allocate it to a specific environment:
$ kubectl taint nodes demo-node env=production:NoSchedule node/demo-node tainted
You can apply multiple taints to a Node by repeating the command. The key value is optional – you can create binary taints by omitting it:
$ kubectl taint nodes demo-node has-gpu:NoSchedule
To remove a previously applied taint, repeat the command but append a hyphen (
-) to the effect name:
$ kubectl taint nodes demo-node has-gpu:NoSchedule- node/demo-node untainted
This will delete the matching taint if it exists.
You can retrieve a list of all the taints applied to a Node using the
describe command. The taints will be shown near the top of the output, after the Node’s labels and annotations:
$ kubectl describe node demo-node Name: demo-node ... Taints: env=production:NoSchedule ...
Adding Tolerations to Pods
The example above tainted
demo-node with the intention of reserving it for production workloads. The next step is to add an equivalent toleration to your production Pods so that they’re permitted to schedule onto the Node.
Pod tolerations are declared in the
spec.tolerations manifest field:
apiVersion: v1 kind: Pod metadata: name: api spec: containers: - name: api image: example.com/api:latest tolerations: - key: env operator: Equals value: production effect: NoSchedule
This toleration allows the
api Pod to schedule to Nodes that have an
env taint with a value of
NoSchedule as the effect. The example Pod can now be scheduled to
To tolerate taints without a value, use the
Exists operator instead:
apiVersion: v1 kind: Pod metadata: name: api spec: containers: - name: api image: example.com/api:latest tolerations: - key: has-gpu operator: Exists effect: NoSchedule
The Pod now tolerates the
has-gpu taint, whether or not a value has been set.
Tolerations do not require that the Pod is scheduled to a tainted Node. This is a common misconception around taints and tolerations. The mechanism only says that a Node can’t host a Pod; it does not express the alternative view that a Pod must be placed on a particular Node. Taints are commonly combined with affinities to achieve this bi-directional behavior.
Taint and Toleration Matching Rules
Tainted Nodes only receive Pods that tolerate all of their taints. Kubernetes first discovers the taints on the Node, then filters out taints that are tolerated by the Pod. The effects requested by the remaining set of taints will be applied to the Pod.
There’s a special case for the
NoExecute effect. Pods that tolerate this kind of taint will usually get to stay on the Node after the taint is applied. You can modify this behavior so that Pods are voluntarily evicted after a given time, despite tolerating the trait:
apiVersion: v1 kind: Pod metadata: name: api spec: containers: - name: api image: example.com/api:latest tolerations: - key: env operator: Equals value: production effect: NoExecute tolerationSeconds: 900
A Node that’s hosting this Pod but is subsequently tainted with
env=production:NoExecute will allow the Pod to remain present for up to 15 minutes after the taint’s applied. The Pod will then be evicted despite having the
Nodes are automatically tainted by the Kubernetes control plane to evict Pods and prevent scheduling when resource contention occurs. Taints such as
node.kubernetes.io/disk-pressure mean Kubernetes is blocking the Node from taking new Pods because it lacks sufficient resources.
Other commonly applied taints include
node.kubernetes.io/not-ready, when a new Node isn’t accepting Pods, and
node.kubernetes.io/unschedulable. The latter is applied to cordoned Nodes to halt all Pod scheduling activity.
These taints implement the Kubernetes eviction and Node management systems. You don’t normally need to think about them and you shouldn’t manage these taints manually. If you see them on a Node, it’s because Kubernetes has applied them in response to changing conditions or another command you’ve issued. It is possible to create Pod tolerations for these taints but doing so could lead to resource exhaustion and unexpected behavior.
Taints and tolerations are a mechanism for repelling Pods away from individual Kubernetes Nodes. They help you avoid undesirable scheduling outcomes by preventing Pods from being automatically assigned to arbitrary Nodes.
Tainting isn’t the only mechanism that provides control over scheduling behavior. Pod affinities and anti-affinities are a related technique for constraining the Nodes that can receive a Pod. Affinity can also be defined at an inter-Pod level, allowing you to make scheduling decisions based on the Pods already running on a Node. You can combine affinity with taints and tolerations to set up advanced scheduling rules.