1. Overview
Namespace is an important concept in kubernetes, an abstraction of a set of resources and objects, often used to isolate different users. namespace has many resources under it, such as our common deployment, pods, service, ingress, configmap, and so on.
Of course, the focus of this article is on what happens when namespace is deleted. A typical scenario is when executing kubectl delete ns test
in the terminal, we will observe that after executing the command, the test namespace will immediately enter the terminating state, and will only be actually deleted after a few seconds. This is even though there are no resources in the test namespace.
Therefore, we will explore the following points in the following.
- How api-server handles namespace deletion requests
- How to handle the resources in the namespace when deleting it
2. How api server handles namespace deletion requests
Unlike other resources, namespace needs to be emptied when it is deleted. When namespace is terminating, it means that the resources under it have not been confirmed to be deleted. Therefore, when the api-server receives a request to delete a namespace, it does not immediately delete it from etcd, but first checks whether the metadata.DeletionTimestamp is empty. If it is empty, metadata.DeletionTimestamp is set to the current time, and then status.Phase is set to terminating. If metadata.DeletionTimestamp is not empty, then we have to determine if spec. If it is empty, then the namespace is actually deleted. This way, the namespace is not deleted if spec. So when is the finalizer added? How does it work?
3. finalizer mechanism
The finalizer of namespace is actually added at the time of creation. The processing logic can be seen in the following code.
|
|
Then the namespace changes to the terminating state when it is deleted and the namespace controller comes into play. The namespace controller is part of the controller manager and listens for namespace add and update events.
|
|
And a workqueue is used to store the change events for each namespace. Then it all triggers nm.namespacedResourcesDeleter.Delete(namespace.Name)
. Of course, if the namespace does not exist or if namespace.DeletionTimestamp is empty, it will exit.
Otherwise the namespace’s phase would be set to terminating first anyway.
This means that if a namespace is already terminating, you can’t change the state of the namespace by just modifying the phase. I’ve had a case before where I manually changed the phase to active when the namespace was terminating, but the namespace immediately became terminating, which is probably why.
|
|
After that, an attempt is made to clear all the contents of the namespace.
4. The working mechanism of DiscoveryInterface
Now we face a problem is how to clean up all the resources under the namespace? Normally, if we want to delete a pod, we can call the PodInterface interface provided by client-go to delete it, which is actually a wrapper for the RESTful HTTP DELETE action. But now, since we don’t know what resources are under the namespace, there is no way to call the delete interface directly.
So client-go also provides a DiscoveryInterface, as the name implies, DicoveryInterface can be used to discover the API groups, versions, resources in the cluster. After getting a list of all the interface resources in the cluster, we can query and delete these resources.
The DicoveryInterface interface is as follows.
One of the ServerGroupInterface provides the ability to get all the interface groups in the cluster with the following function signatures.
ServerVersionInterface can be used to get the version information of the service, the specific function signature is as follows.
Then we need to focus on the ServerResourcesInterface interface.
|
|
Here we can use ServerPreferredNamespacedResources to get a list of all resources that belong to namespace. Then filter out the resources that support DELETE. Finally, we get the GroupVersionResources (GVR for short) of these resources.
|
|
Finally, traverse these GVRs for deletion.
5. Why namespace stays in terminating state for a long time
To find out why namespace stays in terminating state for a long time, let’s look at the following very short piece of code.
If an error is returned when deleting all resources in the namespace, or if the estimated time to finish deleting all resources is greater than 0, the pod will remain in the terminating state. For example, the pod will have a terminationGracePeriodSeconds, so you may have to wait for this period to pass when deleting the pod. But this does not cause any problems, we often encounter the headache that namespace has been unable to delete. Simply put, there must be resources under namespace that cannot be deleted, and there are several possibilities.
Some resources have admissions that prevent deletion, because all deletion requests have to go through the admission webhook first, so it is possible that some resources cannot be deleted directly because of the admissions.
apiservice is having problems. This problem can be confirmed by kubectl get apiservice
. If there is false in the AVAILABLE column, we need to check why the apiservice is not available. If there is a problem with apiservice, the resources under this apiservice cannot be queried or operated by HTTP requests, so naturally, we cannot confirm whether there are still some resources left, and we cannot delete them completely.
Finally, regarding the solution for namespace can not be deleted, the solution given on the Internet is often to do it by emptying the spec.finalizers of the namespace, but this is the solution to the problem but not the root cause. Because if namespace cannot be deleted, it must mean that there is a defect or problem in your cluster, and the solution is to find out the real cause. You can also try to find out what the problem is with this tool: https://github.com/thyarles/knsk