I’ve been using k8s for a while now, and I’ve encountered problems with iptables and other things that cause problems with the network between k8s nodes, so I wanted to look into how the k8s network works.
Docker Networking
Let’s start by looking at how Docker networking is implemented. Docker first creates a bridge called bridge0.
|
|
By default, each container will have a separate netns, and then a veth pair is created, with one end left in the global netns and the other end placed in the container. The veth port in the global netns is added to docker0.
|
|
For the network in the container, on the veth docker assigns and configures an address (e.g. 172.17.0.2) and then sets the default route via 172.17.0.1. On the one hand, you can access the outside network via the default route to 172.17.0.1 and then via iptables NAT.
|
|
On the other hand, since the veth connecting different containers are under the same bridge, the different containers can be considered to be in the same layer 2 network and can naturally access each other.
K8s Network
In k8s, all pods are expected to be interconnected by IP address. One idea is to implement the pods on each node in a docker-like way, i.e. each netns connects to a bridge via veth, and then find a way to route the pods on the other nodes.
Since I build a k8s cluster with k3s, it uses a flannel as the cni. flannel uses vxlan to implement network communication between nodes.
First, let’s see how the pods in the node are networked.
|
|
First, the flannel assigns each node a /24 segment, e.g. the first node is 10.42.0.0/24, the second is 10.42.1.0/24, and so on. The pods in the node are then assigned addresses from this segment, for example, 10.42.0.50/24, whose default gateway is 10.42.0.1. These veth are added to the bridge of cni0. The principle of this part is the same as docker, but with a different name. There is also a corresponding iptables rule.
So, how is the inter-node network implemented? If we want to access the pod 10.42.1.51/24 of the second node from the first node pod 10.42.0.50/24, first, the pod will send to 10.42.0.1/24 according to the default route to reach cni0 of the first node, and then check the routing table.
As you can see, it will match the route 10.42.1.0/24 via 10.42.1.0 dev flannel.1. flannel.1 is a vxlan interface.
|
|
When this interface receives a packet, it queries fdb.
This fdb includes the tuple (MAC address, IP address). When flannel.1 receives an Ethernet Frame, if the destination address matches the MAC address here, it will encapsulate the Eth Frame in UDP and send it to the destination IP address; otherwise, it will broadcast it in this table so that the second node will receive the packet and forward it to the actual pod.
Summary
To summarize the implementation of the k8s network interconnection: the nodes are bridged by a bridge, and the nodes are divided into subnets, which are routed through flannel gateways and interconnected by vxlan between flannel gateways.