Pod prioritization, preemption
Pod prioritization and preemption, introduced in kubernetes v1.8, entered beta status in v1.11, and entered GA phase in v1.14, is already a mature feature.
As the name suggests, the Pod priority, preemption feature, by subdividing applications into different priorities, prioritizes resources to high-priority applications, thus improving resource availability while guaranteeing the quality of service for high-priority applications.
Let’s use the Pod priority and preemption function briefly.
Ibu’s cluster version is v1.14, so feature PodPriority
is enabled by default. The use of preemption mode is divided into two steps.
- define PriorityClass, the value of different PriorityClass is different, the larger the value the higher the priority.
- Create a Pod and set the Pod’s priorityClassName field to the expected PriorityClass.
Create PriorityClass
As follows, Ibu first creates two PriorityClasses: high-priority
and low-priority
, whose values are 1000000 and 10 respectively.
Note that Ibu sets globalDefault
of low-priority
to true, so low-priority
is the default PriorityClass of the cluster, and any Pod that does not have the priorityClassName field configured will have its priority set to low-priority
of 10. A cluster can only have one default PriorityClass. if the default PriorityClass is not set, the priority of the Pod without the PriorityClassName field will be 0.
|
|
Check the current PriorityClass of the system after creation.
As you can see, in addition to the two PriorityClasses created above, the default system also has built-in system-cluster-critical
and system-node-critical
for high-priority system tasks.
Set the PriorityClassName of Pod
For verification purposes, Ibu uses extended resource here. Ibu sets the capacity of the extended resource example.com/foo
to 1 for node x1.
Looking at the allocatable and capacity of x1, you can see that there is 1 example.com/foo
resource on x1.
We first create the Deployment nginx, which will request one example.com/foo
resource, but we don’t set the PriorityClassName, so the Pod’s priority will be the default low-priority
specified by 10.
Then create the Deployment debian, which does not request the example.com/foo
resource.
At this point both Pods can be started normally.
Start preemption.
We change the Deployment debian’s example.com/foo
request volume to 1 and set the priorityClassName to high-priority
.
At this point, since there is only 1 example.com/foo
resource on x1 in the cluster and debian has a higher priority, the scheduler will start to seize it. The following is the observed Pod process.
|
|
Gentleman: Non-preempting PriorityClasses
kubernetes v1.15 added a field PreemptionPolicy
for PriorityClasses, when set to Never
, the Pod will not preempt Pods with lower priority than it, just the scheduling will be prioritized (refer to the value of PriorityClass).
So I call this PriorityClass “gentleman”, because he just silently queue up according to his ability (Priority), and will not steal other people’s resources. The official website gives a suitable example is data science workload.
Compare to Cluster Autoscaler
When kubernetes on the cloud is running low on cluster resources, it can automatically scale the nodes through Cluster Autoscaler, i.e., request more nodes from the cloud vendor and add them to the cluster, thus providing more resources.
However, the shortcomings of this approach are.
- The under-cloud scenario is not easy to implement
- It costs more money to add nodes
- Not immediate, takes time
If users can more clearly divide the priority of applications, they can better improve resource utilization and quality of service by seizing resources from lower priority Pods when resources are insufficient.