Deploy a Kubernetes cluster

Native Kubernetes Cluster Installation Tool

Kubernetes clusters on the cloud, basically all cloud vendors support one-click deployment. The main focus here is on local deployment, or baremetal deployment.

The approach presented in this article is suitable for development testing, and there may still be issues with security, stability, long-term availability, and other solutions.

kubernetes is a component-based system with a lot of flexibility in the installation process, and many components have multiple implementations, each with their own characteristics, making it dizzying for beginners.

It is not easy to install and configure these components one by one and get them to work together.

The following are some of the tools that support Baremetal deployment.

kubeadm: The community’s cluster installation tool, which is now very mature.
1. Difficulty of use: easy
k3s: lightweight kubernetes, small resource requirements, very simple to deploy, suitable for development testing or edge environments
1. support airgap offline deployment
2. easy to use: super easy
alibaba/sealer: support for packaging the entire kubernetes into an image for delivery, and very easy to deploy.
1. ease of use: super easy
2. This project is still in development, but it seems that many toB companies are already using it for k8s application delivery.
kubespray: suitable for self-built production-level clusters, is a large and comprehensive kubernetes installation solution, automatic installation of container runtime, k8s, network plug-ins and other components, and there are many options for each component. But it feels a bit complicated.
1. difficulty of use: medium
2. support airgap offline deployment, but I have tried it before is a pit, now I do not know how it is
3. the underlying use of kubeadm deployment cluster

In order to learn Kubernetes, I use the following official kubeadm for deployment (do not ask why not binary deployment, ask is lazy), container runtime using containerd, network plug-ins are currently the most trendy eBPF-based Cilium.

kubernetes officially introduces two topologies for highly available clusters: “Stacked etcd topology” and “External etcd topology”, for simplicity, this article uses the first “stacked Etcd topology” to create a three-master highly available cluster.

Reference.

Kubernetes Docs - Installing kubeadm
Kubernetes Docs - Creating Highly Available clusters with kubeadm /high-availability/)

1. Environment preparation for the node

First prepare three Linux virtual machines, system selected on demand, and then adjust the settings of these three machines:.

Node configuration.
- master: no less than 2c/3g, hard disk 20G
  - master node performance is also affected by the number of cluster Pods, the above configuration should be able to support up to 100 Pods per Worker node. The above configuration should be able to support up to 100 Pods per worker node. * worker: depends on the demand, recommended not less than 2c/4g, hard disk not less than 20G, 40G if sufficient resources are recommended.
In the same network and interoperable (usually the same LAN)
hostname and mac/ip address of each host and /sys/class/dmi/id/product_uuid, must be unique
- Here the most problematic, usually hostname conflicts!
Must turn off swap for kubelet to work properly!

For convenience, I directly used ryan4yin/pulumi-libvirt to automatically create five virtual machines and set the ip/hostname.

This article uses the KVM cloud image of opensuse leap 15.3 to test the installation.

1.1 iptables setup

The current container network for kubernetes uses the bridge mode by default. In this mode, you need to enable iptables to take over the traffic on the bridge.

Configure it as follows.

sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system

1.2 Open node port

For LAN environment, it is recommended to close the firewall directly. This way all ports are available and convenient.

Usually for our cloud clusters, the firewall is also turned off, but the client ip is restricted by the “security group” provided by the cloud service

The control-plane node, i.e. master, needs to open the following ports.

Protocol	Direction	Port Range	Purpose	Used By
TCP	Inbound	6443*	Kubernetes API server	All
TCP	Inbound	2379-2380	etcd server client API	kube-apiserver, etcd
TCP	Inbound	10250	kubelet API	Self, Control plane
TCP	Inbound	10251	kube-scheduler	Self
TCP	Inbound	10252	kube-controller-manager	Self

The Worker node needs to develop the following ports.

Protocol	Direction	Port Range	Purpose	Used By
TCP	Inbound	10250	kubelet API	Self, Control plane
TCP	Inbound	30000-32767	NodePort Services†	All

In addition, when we test locally, we may prefer to use NodePort on ports 80, 443, 8080, etc. We need to modify the -service-node-port-range parameter of kube-apiserver to customize the port range of NodePort, and the corresponding worker nodes should also open these ports.

2. Installing containerd

First, the environment configuration.

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
nf_conntrack
EOF

sudo modprobe overlay
sudo modprobe br_netfilter
sudo modprobe nf_conntrack

# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

Install containerd+nerdctl:

wget https://github.com/containerd/nerdctl/releases/download/v0.11.1/nerdctl-full-0.11.1-linux-amd64.tar.gz
tar -axvf nerdctl-full-0.11.1-linux-amd64.tar.gz
# 这里简单起见，rootless 相关的东西也一起装进去了，测试嘛就无所谓了...
mv bin/* /usr/local/bin/
mv lib/systemd/system/containerd.service /usr/lib/systemd/system/

systemctl enable containerd
systemctl start containerd

nerdctl is a command line tool for containerd, but its containers and images are completely isolated from Kubernetes containers and images and are not interoperable!

Currently, you can only view and pull Kubernetes containers and mirrors through crictl, and the next section will cover the installation of crictl.

3. install kubelet/kubeadm/kubectl

# 一些全局都需要用的变量
CNI_VERSION="v0.8.2"
CRICTL_VERSION="v1.17.0"
# kubernetes 的版本号
# RELEASE="$(curl -sSL https://dl.k8s.io/release/stable.txt)"
RELEASE="1.22.1"
# kubelet 配置文件的版本号
RELEASE_VERSION="v0.4.0"
# 架构
ARCH="amd64"
#　安装目录
DOWNLOAD_DIR=/usr/local/bin


# CNI 插件
sudo mkdir -p /opt/cni/bin
curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-linux-${ARCH}-${CNI_VERSION}.tgz" | sudo tar -C /opt/cni/bin -xz

# crictl 相关工具
curl -L "https://github.com/kubernetes-sigs/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-${ARCH}.tar.gz" | sudo tar -C $DOWNLOAD_DIR -xz

# kubelet/kubeadm/kubectl
cd $DOWNLOAD_DIR
sudo curl -L --remote-name-all https://storage.googleapis.com/kubernetes-release/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet,kubectl}
sudo chmod +x {kubeadm,kubelet,kubectl}

# kubelet/kubeadm 配置
curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubelet/lib/systemd/system/kubelet.service" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" | sudo tee /etc/systemd/system/kubelet.service
sudo mkdir -p /etc/systemd/system/kubelet.service.d
curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubeadm/10-kubeadm.conf" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" | sudo tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

systemctl enable --now kubelet
# 验证 kubelet 启动起来了，但是目前还没有初始化配置，过一阵就会重启一次
systemctl status kubelet

Try crictl:

export CONTAINER_RUNTIME_ENDPOINT='unix:///var/run/containerd/containerd.sock'
# 列出所有 pods，现在应该啥也没
crictl  pods

# 列出所有镜像
crictl images

4. Create load balancing for the master kube-apiserver to achieve high availability

According to the official kubeadm documentation Kubeadm Docs - High Availability Considerations, the most well-known load balancing approach to achieve high availability for kube-apiserver is keepalived+haproxy, while simpler tools like kube-vip can also be considered.

For simplicity, let’s use kube-vip directly, refer to the official documentation of kube-vip: Kube-vip as a Static Pod with Kubelet.

P.S. I’ve also seen some installation tools ditch keepalived and run a nginx on each node to do load balancing, with all the master addresses written in the configuration…

First, use the following command to generate the kube-vip configuration file, using ARP as an example (we recommend switching to BGP for production environments).

cat <<EOF | sudo tee add-kube-vip.sh
# 你的虚拟机网卡，opensuse/centos 等都是 eth0，但是 ubuntu 可能是 ens3
export INTERFACE=eth0

# 用于实现高可用的 vip，需要和前面的网络接口在同一网段内，否则就无法路由了。
export VIP=192.168.122.200

# 生成 static-pod 的配置文件
mkdir -p /etc/kubernetes/manifests
nerdctl run --rm --network=host --entrypoint=/kube-vip ghcr.io/kube-vip/kube-vip:v0.3.8 \
  manifest pod \
  --interface $INTERFACE \
  --vip $VIP \
  --controlplane \
  --services \
  --arp \
  --leaderElection | tee  /etc/kubernetes/manifests/kube-vip.yaml
EOF

bash add-kube-vip.sh

All three master nodes need to run the above command (the worker does not) to create the static-pod configuration file for kube-vip. After kubeadm initialization, kubelet will automatically pull them up as static pods.

5. Creating a cluster with kubeadm

All you need to do is run this command.

# 极简配置：
cat <<EOF | sudo tee kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
  criSocket: "/var/run/containerd/containerd.sock"
  imagePullPolicy: IfNotPresent
---
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.22.1
clusterName: kubernetes
certificatesDir: /etc/kubernetes/pki
imageRepository: k8s.gcr.io
controlPlaneEndpoint: "192.168.122.200:6443"  # 填 apiserver 的 vip 地址，或者整个域名也行，但是就得加 /etc/hosts 或者内网 DNS 解析
networking:
  serviceSubnet: "10.96.0.0/16"
  podSubnet: "10.244.0.0/16"
etcd:
  local:
    dataDir: /var/lib/etcd
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
# 让 kubelet 从 certificates.k8s.io 申请由集群 CA Root 签名的 tls 证书，而非直接使用自签名证书
# 如果不启用这个， 安装 metrics-server 时就会遇到证书报错，后面会详细介绍。
serverTLSBootstrap: true
EOF

# 查看 kubeadm 默认的完整配置，供参考
kubeadm config print init-defaults > init.default.yaml

# 执行集群的初始化，这会直接将当前节点创建为 master
# 成功运行的前提：前面该装的东西都装好了，而且 kubelet 已经在后台运行了
# `--upload-certs` 会将生成的集群证书上传到 kubeadm 服务器，在两小时内加入集群的 master 节点会自动拉证书，主要是方便集群创建。
kubeadm init --config kubeadm-config.yaml --upload-certs

kubeadm should report an error indicating that some of your dependencies do not exist, so install the dependencies first.

`1`	`sudo zypper in -y socat ebtables conntrack-tools`

Re-run the previous kubeadm command again and it should execute properly, it does the following operations.

Pull the container image of the control plane
Generate a ca root certificate
Generate tls certificates for etcd/apiserver etc. using the root certificate
Generate kubeconfig configuration for each component of the control plane
Generate static pod configuration, kubelet will automatically pull up kube-proxy and all other k8s master components based on these configurations

After running, it will give you three commands.

Place kubeconfig under $HOME/.kube/config, which kubectl needs to use to connect to the kube-apiserver

The command to add the control-plane node to the cluster:

Because we added the static-pod configuration of kube-vip in advance, the preflight-check here will report an error, so you need to add this parameter to ignore the error - --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests

kubeadm join 192.168.122.200:6443 --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane --certificate-key <key> \
  --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests

The command to join a worker node to a cluster:

1
2

kubeadm join 192.168.122.200:6443 --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

After running the first part of the kubeconfig processing command, you can use kubectl to view the status of the cluster.

k8s-master-0:~/kubeadm # kubectl get no
NAME           STATUS     ROLES                  AGE   VERSION
k8s-master-0   NotReady   control-plane,master   79s   v1.22.1
k8s-master-0:~/kubeadm # kubectl get po --all-namespaces
NAMESPACE     NAME                                   READY   STATUS    RESTARTS   AGE
kube-system   coredns-78fcd69978-6tlnw               0/1     Pending   0          83s
kube-system   coredns-78fcd69978-hxtvs               0/1     Pending   0          83s
kube-system   etcd-k8s-master-0                      1/1     Running   6          90s
kube-system   kube-apiserver-k8s-master-0            1/1     Running   4          90s
kube-system   kube-controller-manager-k8s-master-0   1/1     Running   4          90s
kube-system   kube-proxy-6w2bx                       1/1     Running   0          83s
kube-system   kube-scheduler-k8s-master-0            1/1     Running   7          97s

Now run the join cluster command printed out earlier on the other nodes and you have a highly available cluster set up.

After all the nodes have joined the cluster, you should have three control plane masters and two workers when viewed through kubectl.

k8s-master-0:~/kubeadm # kubectl get node
NAME           STATUS     ROLES                  AGE     VERSION
k8s-master-0   NotReady   control-plane,master   26m     v1.22.1
k8s-master-1   NotReady   control-plane,master   7m2s    v1.22.1
k8s-master-2   NotReady   control-plane,master   2m10s   v1.22.1
k8s-worker-0   NotReady   <none>                 97s     v1.22.1
k8s-worker-1   NotReady   <none>                 86s     v1.22.1

Right now they are all in the NotReady state and will not be ready until we get the network plug-in installed.

Now look at the certificate issuance status of the cluster again.

❯ kubectl get csr --sort-by='{.spec.username}'
NAME        AGE     SIGNERNAME                                    REQUESTOR                  REQUESTEDDURATION   CONDITION
csr-95hll   6m58s   kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:q8ivnz    <none>              Approved,Issued
csr-tklnr   7m5s    kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:q8ivnz    <none>              Approved,Issued
csr-w92jv   9m15s   kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:q8ivnz    <none>              Approved,Issued
csr-rv7sj   8m11s   kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:q8ivnz    <none>              Approved,Issued
csr-nxkgx   10m     kubernetes.io/kube-apiserver-client-kubelet   system:node:k8s-master-0   <none>              Approved,Issued
csr-cd22c   10m     kubernetes.io/kubelet-serving                 system:node:k8s-master-0   <none>              Pending
csr-wjrnr   9m53s   kubernetes.io/kubelet-serving                 system:node:k8s-master-0   <none>              Pending
csr-sjq42   9m8s    kubernetes.io/kubelet-serving                 system:node:k8s-master-1   <none>              Pending
csr-xtv8f   8m56s   kubernetes.io/kubelet-serving                 system:node:k8s-master-1   <none>              Pending
csr-f2dsf   8m3s    kubernetes.io/kubelet-serving                 system:node:k8s-master-2   <none>              Pending
csr-xl8dg   6m58s   kubernetes.io/kubelet-serving                 system:node:k8s-worker-0   <none>              Pending
csr-p9g24   6m52s   kubernetes.io/kubelet-serving                 system:node:k8s-worker-1   <none>              Pending

You can see that several of the kubernetes.io/kubelet-serving certificates are still pending, because we set serverTLSBootstrap: true in the kubeadm configuration file to allow Kubelet to request CA-signed certificates from the cluster, instead of self-signing result.

The main purpose of setting this parameter is to allow components such as metrics-server to communicate with kubelet using the https protocol and to avoid adding the parameter --kubelet-insecure-tls for metrics-server.

Currently kubeadm does not support automatic approval of certificates requested by kubelets, we need to approve them manually:

1
2

# 批准 Kubelet 申请的所有证书
kubectl certificate approve csr-cd22c csr-wjrnr csr-sjq42 csr-xtv8f csr-f2dsf csr-xl8dg csr-p9g24

Until these certificates are approved, all functions that require calls to the kubelet api will not be available, such as

View pod logs
Get node metrics
etc.

5.1 Frequently Asked Questions

5.1.1 Using domestic mirror sources

If you do not have a VPN, kubeadm’s default mirror repository cannot be pulled from within China. If you have high reliability requirements, it is better to build your own private mirror repository and push the mirrors to the private repository.

You can list all the mirror addresses you need to use with the following command.

❯ kubeadm config images list --kubernetes-version v1.22.1
k8s.gcr.io/kube-apiserver:v1.22.1
k8s.gcr.io/kube-controller-manager:v1.22.1
k8s.gcr.io/kube-scheduler:v1.22.1
k8s.gcr.io/kube-proxy:v1.22.1
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns/coredns:v1.8.4

Use a tool or script such as skopeo to copy the above image to your private repository, or for convenience (test environment) consider looking for a synchronized image address online. Add the image address to kubeadm-config.yaml and deploy it.

5.1.2 Resetting the cluster configuration

If anything goes wrong during the cluster creation process, you can restore the configuration by running kubeadm reset on all nodes, and then go through the kubeadm cluster creation process again.

However, a few things should be noted.

kubeadm reset will clear all static-pod configuration files including kube-vip configuration, so the master node needs to re-run the kube-vip command given earlier and generate the kube-vip configuration.
kubeadm reset will not reset the network interface configuration, the master node needs to manually clean up the vip added by kube-vip: ip addr del 192.168.122.200/32 dev eth0 .
If you wish to reinstall the cluster after installing the network plugin, the sequence is as follows.
- delete -f xxx.yamlviakubectl delete -f xxx.yaml/helm uninstall` Delete all other application configurations except network
- Delete the network plugin
- Reboot all nodes first, or manually reset the network configuration of all nodes
  - Recommend reboot, because I don’t know how to reset manually… tried systemctl restart network and it doesn’t clean up all virtual network interfaces.

After doing so, re-run the cluster installation and you should be fine.

6. Verifying the high availability of the cluster

Although the network plug-in is not yet installed and all nodes of the cluster are not yet ready, we can already verify the high availability of the cluster simply by using the kubectl command.

First, we install the authentication file $HOME/.kube/config and kunbectl that we placed on k8s-master-0 on another machine, such as my host machine.

Then run the kubectl get node command on the host machine to verify the high availability of the cluster.

When all three master nodes are running normally, the kubectl command also works
pause or stop one of the masters, the kubectl command still works fine
pause the second master again, the kubectl command should get stuck and time out, and it won’t work
resume restore one of the two stopped masters, and the kubectl command will work again

This completes kubeadm’s work, and the next step is to install the network plugin and the cluster will be available.

7. Installing Network Plugins

There are many kinds of network plugins available in the community, the more well-known ones with good performance are Calico and Cilium, where Cilium focuses on high performance and observability based on eBPF.

Here are the installation methods of these two plug-ins respectively. (Note that only one of the network plug-ins can be installed, not repeated.)

You need to install helm locally in advance, I use the host here, so you only need to install it on the host:

# 一行命令安装，也可以自己手动下载安装包，都行
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash

# 或者 opensuse 直接用包管理器安装
sudo zypper in helm

7.1 Installing Cilium

Official documentation: https://docs.cilium.io/en/v1.10/gettingstarted/k8s-install-kubeadm/

cilium provides a high-performance and highly observable k8s clustering network via eBPF. cilium also provides a more efficient implementation than kube-proxy and can completely replace kube-proxy.

Let’s start with kube-proxy mode and familiarize ourselves with cilium.

helm repo add cilium https://helm.cilium.io/
helm search repo cilium/cilium -l | head

helm install cilium cilium/cilium --version 1.10.4 --namespace kube-system

You can check the progress of cilium installation via kubectl get pod -A, when all pods are ready, the cluster is ready~.

cilium also provides a dedicated client.

curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-amd64.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz{,.sha256sum}

Then use the cilium client to check the status of the web plugin:

 $ cilium status --wait
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

DaemonSet         cilium             Desired: 5, Ready: 5/5, Available: 5/5
Deployment        cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
Containers:       cilium             Running: 5
                  cilium-operator    Running: 2
Cluster Pods:     2/2 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.10.4@sha256:7d354052ccf2a7445101d78cebd14444c7c40129ce7889f2f04b89374dbf8a1d: 5
                  cilium-operator    quay.io/cilium/operator-generic:v1.10.4@sha256:c49a14e34634ff1a494c84b718641f27267fb3a0291ce3d74352b44f8a8d2f93: 2

cilium also provides commands to automatically create pods for connectivity testing of clustered networks:

❯ cilium connectivity test
ℹ️  Monitor aggregation detected, will skip some flow validation steps
✨ [kubernetes] Creating namespace for connectivity check...
✨ [kubernetes] Deploying echo-same-node service...
✨ [kubernetes] Deploying same-node deployment...
✨ [kubernetes] Deploying client deployment...
✨ [kubernetes] Deploying client2 deployment...
✨ [kubernetes] Deploying echo-other-node service...
✨ [kubernetes] Deploying other-node deployment...
...
ℹ️  Expose Relay locally with:
   cilium hubble enable
   cilium status --wait
   cilium hubble port-forward&
🏃 Running tests...
...
---------------------------------------------------------------------------------------------------------------------
✅ All 11 tests (134 actions) successful, 0 tests skipped, 0 scenarios skipped.

As you can observe by kubectl get po -A, this test command automatically creates a cilium-test namespace and creates several pods for detailed testing at startup.

The whole test process will last about 5 minutes or so, and the pods will not be automatically deleted after the test is completed, but will be deleted manually using the following command.

`1`	`kubectl delete namespace cilium-test`

7.2 Installing Calico

Official documentation: https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises

It’s only two or three lines of command. The installation is really easy, so I don’t bother to introduce it, just read the official documentation.

But there are actually quite a lot of details about calico, so we recommend reading through its official documentation to understand the architecture of calico.

8. check cluster status

The official dashboard doesn’t work very well, so we recommend installing a local k9s directly, which is particularly cool.

`1`	`sudo zypper in k9s`

Then it’s time to have fun.

9. Install metrics-server

Possible issues with this step: Enabling signed kubelet serving certificates

If you need to use HPA and simple cluster monitoring, then metrics-server must be installed, so let’s install it now.

First, running kubectl’s monitoring command should report an error.

1
2

❯ kubectl top node
error: Metrics API not available

You shouldn’t see any monitoring metrics in k9s either.

Now install it via helm.

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm search repo metrics-server/metrics-server -l | head

helm upgrade --install metrics-server metrics-server/metrics-server --version 3.5.0 --namespace kube-system

metrics-server will only deploy one instance by default, if you want high availability, please refer to the official configuration: metrics-server - high-availability manifests

Once the metrics-server is up and running, you can use the kubectl top command.

10. Add regular backup capability to etcd

See backup and restore for etcd

11. Installing Volume Provisioner

As we learn to use stateful applications such as Prometheus/MinIO/Tekton, they will by default declare the required data volumes via PVC.

To support this capability, we need to deploy a Volume Provisioner in the cluster.

For on-cloud environments, it is OK to directly access the Volume Provisioner provided by the cloud provider, which is convenient, hassle-free, and reliable enough.

For bare-metal environments, the more famous one should be rook-ceph, but this thing is complicated to deploy and difficult to maintain, not suitable for testing and learning, and not suitable for production environments.

For development, test environments, or personal clusters, it is recommended to use.

local data volume, suitable for data can be lost, and does not require distributed scenarios, such as development and testing environments
- https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner
- https://github.com/rancher/local-path-provisioner
NFS data volumes, suitable for scenarios where data can be lost, performance requirements are not high, and distributed is required. For example, development test environments, or applications that do not have much pressure online
- https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner
- https://github.com/kubernetes-csi/csi-driver-nfs
- The reliability of NFS data depends on external NFS servers, and enterprises often use NAS such as Synology as NFS servers.
- If the external NFS server goes down, the application will crash.
Direct use of object storage on the cloud is suitable for scenarios where you want to have no data loss and low performance requirements.
- Use https://github.com/rclone/rclone mount mode directly to save data, or sync folder data directly to the cloud (there may be some data loss).

Table of Contents