Technical Series: Kubernetes Networking
Kubernetes has become a significant feature of my work life in the past year. So I am exploring how it is built under the hood. One of the major areas I am exploring is networking. I want to understand how Kubernetes handles networking.
Before I start, I would like to explain what pods are in Kubernetes. According to the Kubernetes official documentation, pods are “the smallest deployable computing units that you can create and manage in Kubernetes.” Pods usually contain one or more containers. Pods are usually scheduled on a Kubernetes node. A node can be a virtual or physical machine part of the Kubernetes cluster. Nodes can contain one or more pods.
Now that we understand pods and their function let us dive in.
One of the basic principles of Kubernetes is that every pod can communicate with one another by default, and each pod has its IP address. Every node has a network interface (e.g., eth0). A node’s network interface is resident in the node’s network namespace.
According to Linux manpage, “A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.” In short, Linux namespaces determine a process’ view of system resources.
Pods within a node also have their own separate network namespace. To enable communication between the pod’s network namespace and the node’s network namespace, we will set up virtual ethernet interfaces (veth). Virtual Ethernet interfaces come in pairs and whatever traffic goes through one end of the pair comes out of the other end. Then we assign one end of the pair to the pod’s network namespace and the other end to the node’s network namespace. This way, we enable incoming and outgoing traffic between the two namespaces.
Now that pod’s network namespace is connected to the node’s network namespace, how do pods communicate with one another? (Note: I will be referring to Virtual Ethernet Interface as veth from here onwards)
For pods within the same node, the veths of each pod created in the node’s network namespace are bridged. Therefore, if the veth of Pod 1 in the node’s network namespace receives a packet destined for Pod 2, it routes it through the bridge and delivers it to the veth of Pod 2. The veth of Pod 2 sends it to the eth0 resident in Pod 2’s network namespace.
Communication between pods in different nodes is slightly different. Imagine Pod 1 sends a packet to Pod 4 in another node, the packet is delivered to the veth of Pod 1 in the node’s network namespace. However, instead of going over the bridge, it heads to the node’s network interface outside the node. Routing between nodes is network-specific, and Kubernetes leaves that decision up for configuration. Once the packet arrives at the appropriate node, the node then forwards the packet to the right pod over the bridge.
However, in the Kubernetes cluster model, Pods come and go due to restarts, failures, rollbacks, and more. Sometimes, the pod IPs change as well. How can we have a stable way to reach an application regardless of whether its pod has been restarted multiple times? The solution is a Kubernetes service. A Kubernetes service is an abstraction over an endpoint, and Endpoints are Kubernetes resources that hold the IPs of similar pods.
When a pod fails, its IP is taken out of the endpoint, while new IPs are added when a new pod starts. Services have a stable virtual IP. Meaning the IP of a service will not change regardless of changes in the endpoints.
When Pod 1 sends a packet destined for Service S, and the packet arrives at the veth of Pod 1 in the node’s network namespace, the iptable changes the destination IP from the service S IP address to that of a pod of a pod in the list of endpoints. Then the packet is sent as if in a pod to pod communication. Kubernetes uses conntrack, a Linux subsystem, to track connections. Conntrack records the initial destination IP and the iptable uses that to rewrite the source IP of the response to match the Service’s IP.
In conclusion, this article explains the basics of pod to pod communication and pod to service communication. However, Kubernetes networking is extensive, and I have not begun to scratch the surface. In the next few weeks, I will be diving deeper and writing more about other aspects of Kubernetes networking.
Until then, happy holidays.