1. Overview
The apiserver of k8s is the hub of communication for all components, and its importance is self-explanatory. apiserver can provide HTTP-based services to the outside world, so what steps does a request go through from issuing to processing? The following is a brief description of the entire process based on the code so that you can get a general impression of the process.
Since the code structure of apiserver is not simple, we will try to post as little code as possible. The following analysis is based on k8s 1.18
2. The processing chain of requests
|
|
The processing chain of this request is executed from back to front. So the request goes through the handler as follows.
- PanicRecovery
- ProbabilisticGoaway
- RequestInfo
- WaitGroup
- TimeoutForNonLongRunningRequests
- CORS
- Authentication
- failedHandler: FailedAuthenticationAudit
- failedHandler: Unauthorized
- Audit
- Impersonation
- PriorityAndFairness / MaxInFlightLimit
- Authorization
It is then passed to the director, who distributes it to gorestfulContainer or nonGoRestfulMux. gorestfulContainer is the main part of the apiserver.
PanicRecovery
runtime.HandleCrash prevents panic, and logs the details of the panic request.
ProbabilisticGoaway
Because the client and apiserver are using http2 long connections. So even if the apiserver is load balanced, some of the client’s requests will keep hitting the same apiserver. goaway configures a small chance that the apiserver will respond GOWAY to the client after receiving the request, so that the client will create a new tcp connection to load balance to a different apiserver. This chance can range from 0 to 0.02
Related PR: https://github.com/kubernetes/kubernetes/pull/88567
RequestInfo
RequestInfo parses the HTTP request. The following information is obtained.
|
|
WaitGroup
waitgroup is used to handle short connection exits.
How can we tell if it’s a long connection? Here it is determined by the request action or subresource. watch and proxy are determined by the path of the request on requestinfo.
|
|
This way, the handler of the waitgroup will be done only after all subsequent handlers have exited, so that it can exit gracefully.
TimeoutForNonLongRunningRequests
For non-long connection requests, use ctx’s cancel to cancel the request after the timeout.
CORS
Set some CORS response headers.
Authentication
Begin authenticating the user. Successful authentication removes Authorization
from the request. Then the request is passed to the next handler, otherwise it is passed to the next failed handler.
There are a number of ways to handle this. These include
- Requestheader, which takes out X-Remote-User, X-Remote-Group, X-Remote-Extra from the request.
- X509 certificate validation.
- BearerToken
- WebSocket
- Anonymous: in case anonymity is allowed
There is also a section that provides authentication in the form of a plugin.
- bootstrap token
- Basic auth
- password
- OIDC
- Webhook
Authentication is considered successful if one of them succeeds. and if the user is system:anonymous
or the user group contains system:unauthenticated
and system:authenticated
. it returns directly, otherwise it modifies the user information and returns.
Notice that the user is now part of system:authenticated
. That is, it is authenticated.
FailedAuthenticationAudit
This will only be executed after an authentication failure. It mainly provides auditing capabilities.
Unauthorized
Unauthorized processing, called after FailedAuthenticationAudit
Audit
Provides the audit function for requests
Impersonation
Impersonation is a feature that assumes the current user as another user, which helps administrators to test whether the permissions of different users are configured correctly, etc. The key to get the header is.
- Impersonate-User: user
- Impersonate-Group: group
- Impersonate-Extra-: additional information
Users are divided into service account and user. service account is formatted as namespace/name, otherwise it is treated as user.
The final format of a service account is: system:serviceaccount:namespace:name
PriorityAndFairness / MaxInFlightLimit
If flow control is set, PriorityAndFairness is used, otherwise MaxInFlightLimit is used.
PriorityAndFairness: will do priority ranking of requests. Requests of the same priority will have fairness-related controls.
MaxInFlightLimit: The maximum number of immutable requests in progress in a given time. When this value is exceeded, the service will reject all requests. 0 value means no limit. (Default value 400)
Reference: https://kubernetes.io/zh/docs/concepts/cluster-administration/flow-control/
Authorization
Authentication takes the information needed for this structure above from the context and then authenticates it. The following authentication methods are supported.
- Always allow
- Always deny
- Path: Allows partial paths to always be accessible
Some other common authentication methods are provided mainly through plugins.
- Webhook
- RBAC
- Node
Where Node is designed specifically for kubelet, the node authenticator allows kubelet to perform API operations. This includes:
Read operation:
- services
- endpoints
- nodes
- pods
- secrets, configmaps, pvcs, and pod-related persistent volumes bound to kubelet nodes
Write operations:
- Node and node state (enable the
NodeRestriction
access plugin to restrict the kubelet to only modify its own nodes) - Pods and Pod state (enable the
NodeRestriction
access plugin to restrict the kubelet to only modify Pods bound to itself) - Events
Authentication-related operations.
- read/write permissions for the certificationsigningrequests API used during TLS-based bootstrapping
- Ability to create tokenreviews and subjectaccessreviews for delegated authentication/authorization checks
In future releases, the node authenticator may add or remove permissions to ensure that the kubelet has the minimum set of permissions needed to operate correctly.
In order to obtain authorization from the node authenticator, the kubelet must use a credential to indicate that it is in the system:nodes
group with the username system:node:<nodeName>
. The above group name and username format should match the identity created for each kubelet during the kubelet TLS bootstrapping process.
director
The director’s ServeHTTP method is defined as follows, i.e. it will be forwarded according to the defined webservice matching rules. Otherwise, it calls nonGoRestfulMux for processing.
|
|
admission webhook
The last step before the request is actually processed is our admission webhook. admission is called in the specific REST processing code. In create, update and delete, mutate is called first, followed by validating. k8s itself has a number of admissions built in, provided as plugins, as follows.
- AlwaysAdmit
- AlwaysPullImages
- LimitPodHardAntiAffinityTopology
- CertificateApproval/CertificateSigning/CertificateSubjectRestriction
- DefaultIngressClass
- DefaultTolerationSeconds
- ExtendedResourceToleration
- OwnerReferencesPermissionEnforcement
- ImagePolicyWebhook
- LimitRanger
- NamespaceAutoProvision
- NamespaceExists
- NodeRestriction
- TaintNodesByCondition
- PodNodeSelector
- PodPreset
- PodTolerationRestriction
- Priority
- ResourceQuota
- RuntimeClass
- PodSecurityPolicy
- SecurityContextDeny
- ServiceAccount
- PersistentVolumeLabel
- PersistentVolumeClaimResize
- DefaultStorageClass
- StorageObjectInUseProtection
3. how to read the code related to apiserver
The repository I am looking at is https://github.com/kubernetes/kubernetes. The apiserver code is mainly scattered in the following locations.
- cmd/kube-apiserver: apiserver main function entry. It mainly encapsulates a lot of startup parameters.
- pkg/kubeapiserver: Provides code shared by kube-apiserver and federation-apiserve, but is not part of the generic API server.
- plugin/pkg: The following are all plugins related to authentication, authentication and access control
- staging/src/apiserver: This is the core code of apiserver. The pkg/server below it is the service startup portal.