Tasks

Tasks
Administer a Cluster
Access Clusters Using the Kubernetes API
Access Services Running on Clusters
Advertise Extended Resources for a Node
Autoscale the DNS Service in a Cluster
Change the Reclaim Policy of a PersistentVolume
Change the default StorageClass
Cluster Management
Configure Multiple Schedulers
Configure Out Of Resource Handling
Configure Quotas for API Objects
Control CPU Management Policies on the Node
Customizing DNS Service
Debugging DNS Resolution
Declare Network Policy
Developing Cloud Controller Manager
Encrypting Secret Data at Rest
Guaranteed Scheduling For Critical Add-On Pods
IP Masquerade Agent User Guide
Kubernetes Cloud Controller Manager
Limit Storage Consumption
Namespaces Walkthrough
Operating etcd clusters for Kubernetes
Reconfigure a Node's Kubelet in a Live Cluster
Reserve Compute Resources for System Daemons
Safely Drain a Node while Respecting Application SLOs
Securing a Cluster
Set Kubelet parameters via a config file
Set up High-Availability Kubernetes Masters
Share a Cluster with Namespaces
Static Pods
Storage Object in Use Protection
Using CoreDNS for Service Discovery
Using a KMS provider for data encryption
Using sysctls in a Kubernetes Cluster
Extend kubectl with plugins
Manage HugePages
Schedule GPUs

Edit This Page

Guaranteed Scheduling For Critical Add-On Pods

In addition to Kubernetes core components like api-server, scheduler, controller-manager running on a master machine there are a number of add-ons which, for various reasons, must run on a regular cluster node (rather than the Kubernetes master). Some of these add-ons are critical to a fully functional cluster, such as metrics-server, DNS, and UI. A cluster may stop working properly if a critical add-on is evicted (either manually or as a side effect of another operation like upgrade) and becomes pending (for example when the cluster is highly utilized and either there are other pending pods that schedule into the space vacated by the evicted critical add-on pod or the amount of resources available on the node changed for some other reason).

Rescheduler: guaranteed scheduling of critical add-ons

Rescheduler is deprecated as of Kubernetes 1.10 and will be removed in version 1.12 in accordance with the deprecation policy for beta features.

To avoid eviction of critical pods, you must enable priorities in scheduler before upgrading to Kubernetes 1.10 or higher.

Rescheduler ensures that critical pods created by DaemonSet controller are always scheduled (assuming the cluster has enough resources to run the critical add-on pods in the absence of regular pods). If the scheduler determines that no node has enough free resources to run the critical add-on pod given the pods that are already running in the cluster (indicated by critical add-on pod’s pod condition PodScheduled set to false, the reason set to Unschedulable) the rescheduler tries to free up space for the DaemonSet critical pod by evicting some pods; then the scheduler will schedule the add-on pod.

To avoid situation when another pod is scheduled into the space prepared for the critical add-on, the chosen node gets a temporary taint “CriticalAddonsOnly” before the eviction(s) (see more details). Each critical add-on has to tolerate it, while the other pods shouldn’t tolerate the taint. The taint is removed once the add-on is successfully scheduled.

Warning: currently there is no guarantee which node is chosen and which pods are being killed in order to schedule critical pods, so if rescheduler is enabled your pods might be occasionally killed for this purpose. Please ensure that rescheduler is not enabled along with priorities & preemptions in default-scheduler as rescheduler is oblivious to priorities and it may evict high priority pods, instead of low priority ones.

Config

Rescheduler doesn’t have any user facing configuration (component config) or API.

Marking pod as critical when using Rescheduler.

To be considered critical, the pod has to run in the kube-system namespace (configurable via flag) and

The first one marks a pod a critical. The second one is required by Rescheduler algorithm.

A pod could also be considered critical, if its priority is greater than or equal to system-critical-priority.

Marking pod as critical when priorites are enabled.

To be considered critical, the pod has to run in the kube-system namespace (configurable via flag) and