Cross-Node Pod Communication in Kubernetes: Plain IP Routing with Kindnet
Most CNI plugins wrap your packets in VXLAN or some other overlay. Kindnet doesn’t. When a pod on one node pings a pod on another node, the packet travels as plain IP.
This raises a question: if there’s no overlay, how does the packet actually get from Node A to Node B? The destination IP is a pod address like 10.244.1.5, not a node address. The underlying network doesn’t know anything about pod CIDRs and something has to bridge that gap.
In this article we’ll capture packets at every hop, inspect the routing decisions, and prove that pod IPs stay intact the entire way. By the end, you’ll understand how Kindnet moves traffic across nodes without touching the IP headers.
In the previous article, we traced intra-node pod traffic through veth pairs and /32 host routes. Now let’s cross node boundaries.
No Overlay: What Kindnet Actually Configures
Before tracing packets, we need to confirm what Kindnet is actually doing. Is it secretly running VXLAN? Let’s check the CNI config:
minikube -p multi ssh -n multi 'cat /etc/cni/net.d/10-kindnet.conflist'
{
"cniVersion": "0.3.1",
"name": "kindnet",
"plugins": [
{
"type": "ptp",
"ipMasq": false,
"ipam": {
"type": "host-local",
"routes": [{ "dst": "0.0.0.0/0" }],
"ranges": [[{ "subnet": "10.244.0.0/24" }]]
}
}
]
}
The plugin type is ptp—point-to-point. This creates the veth pairs we examined in the pod birth article. Notice what’s missing: no vxlan, no ipip, no overlay plugin. Packets leave the node as regular IP.
So if there’s no tunnel wrapping these packets, the nodes must be routing them directly. That means the routing tables are doing all the work.
The Setup: Two Nodes, Two Pod CIDRs
We’re running a two-node Minikube cluster. Pod A lives on multi, Pod B lives on multi-m02:
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
net-tools-6bdcf48868-8lzv7 1/1 Running 0 31s 10.244.0.4 multi
net-tools-6bdcf48868-vxql7 1/1 Running 0 3m42s 10.244.1.5 multi-m02
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP
multi Ready control-plane 5m v1.34.0 192.168.49.2
multi-m02 Ready <none> 5m v1.34.0 192.168.49.3
Pod A has IP 10.244.0.4, Pod B has IP 10.244.1.5 and they are on different subnets: 10.244.0.0/24 vs 10.244.1.0/24. The nodes themselves are at 192.168.49.2 and 192.168.49.3.
The question is: how does a packet destined for 10.244.1.5 find its way from multi to multi-m02?
The answer is in the routing tables.
On multi:
minikube -p multi ssh -n multi 'ip route | grep 10.244'
10.244.0.2 dev veth83ba7a43 scope host
10.244.0.4 dev vethf837b274 scope host
10.244.1.0/24 via 192.168.49.3 dev eth0
The first two lines are /32 routes for local pods, we covered these in the intra-node article. The third line is new: to reach anything in 10.244.1.0/24, forward to 192.168.49.3 via eth0. Yes, that’s Node B’s IP.
On multi-m02:
minikube -p multi ssh -n multi-m02 'ip route | grep 10.244'
10.244.0.0/24 via 192.168.49.2 dev eth0
10.244.1.5 dev vetha7974668 scope host
To reach 10.244.0.0/24, forward to 192.168.49.2, so the /32 route handles local delivery to Pod B.
Each node knows how to reach the other node’s pod subnet by forwarding to that node’s IP.
Now we’ll need to watch real packets.
Tracing the Packet: Three Capture Points
We’ll set up tcpdump at three locations: Pod A’s veth on the source node, eth0 on the destination node, and Pod B’s veth. If the packet’s source IP changes anywhere along the way, we’ll see it.
First, identify Pod A’s veth:
minikube -p multi ssh -n multi 'ip route | grep 10.244.0.4'
10.244.0.4 dev vethf837b274 scope host
Start capturing on that interface:
minikube -p multi ssh -n multi 'sudo tcpdump -i vethf837b274 -n icmp'
listening on vethf837b274, link-type EN10MB (Ethernet), snapshot length 262144 bytes
In a second terminal, capture on multi-m02’s eth0:
minikube -p multi ssh -n multi-m02 'sudo tcpdump -i eth0 -n icmp and host 10.244.1.5'
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
In a third terminal, identify Pod B’s veth and capture there:
minikube -p multi ssh -n multi-m02 'ip route | grep 10.244.1.5'
10.244.1.5 dev vetha7974668 scope host
minikube -p multi ssh -n multi-m02 'sudo tcpdump -i vetha7974668 -n icmp'
listening on vetha7974668, link-type EN10MB (Ethernet), snapshot length 262144 bytes
Now trigger the ping:
kubectl exec -it net-tools-6bdcf48868-8lzv7 -- ping -c1 10.244.1.5
PING 10.244.1.5 (10.244.1.5): 56 data bytes
64 bytes from 10.244.1.5: seq=0 ttl=62 time=0.234 ms
Check the captures starting with Pod A’s veth:
IP 10.244.0.4 > 10.244.1.5: ICMP echo request
IP 10.244.1.5 > 10.244.0.4: ICMP echo reply
On multi-m02’s eth0:
IP 10.244.0.4 > 10.244.1.5: ICMP echo request
IP 10.244.1.5 > 10.244.0.4: ICMP echo reply
On Pod B’s veth:
IP 10.244.0.4 > 10.244.1.5: ICMP echo request
IP 10.244.1.5 > 10.244.0.4: ICMP echo reply
We see same IPs at every hop. The packet crossed two nodes and nobody rewrote anything.
But this raises another question: the pod doesn’t know anything about nodes or routing tables.
How did it decide where to send the packet in the first place?
Inside the Pod: The Default Gateway
Pod A has no route for 10.244.1.0/24. All it knows is: anything not local goes to the default gateway at 10.244.0.1. The pod hands off the packet and its job is done:
kubectl exec -it net-tools-6bdcf48868-8lzv7 -- ip route
default via 10.244.0.1 dev eth0
10.244.0.0/24 via 10.244.0.1 dev eth0 src 10.244.0.4
10.244.0.1 dev eth0 scope link src 10.244.0.4
That gateway address points to the host side of the veth pair. Once the packet crosses that boundary, the host kernel takes over.
The packet is now in the host’s network stack. What does the kernel do with a packet destined for 10.244.1.5?
The Source Node’s Routing Decision
The kernel matches the /24 route and decides: send this out eth0, next hop 192.168.49.3:
minikube -p multi ssh -n multi 'ip route get 10.244.1.5'
10.244.1.5 via 192.168.49.3 dev eth0 src 192.168.49.2 uid 0
cache
This is where we usually get confused.
The output says via 192.168.49.3, but does that mean the destination IP gets rewritten to 192.168.49.3?
No. The via only affects layer 2.
It tells the kernel which MAC address to put in the Ethernet frame header. The IP packet inside that frame still shows 10.244.0.4 → 10.244.1.5. The node IP never appears in the IP headers. So, it’s purely an Ethernet-level decision.
The packet leaves eth0 wrapped in a frame addressed to Node B’s MAC. Inside that frame, the original pod IPs remain untouched.
The Destination Node’s Routing Decision
The packet arrives at multi-m02’s eth0. The kernel strips the Ethernet header and looks at the IP destination: 10.244.1.5. Route lookup:
minikube -p multi ssh -n multi-m02 'ip route get 10.244.1.5'
10.244.1.5 dev vetha7974668 src 10.244.1.1 uid 0
cache
The /32 route matches. The kernel delivers the packet directly to Pod B’s veth.
Same mechanism we saw in the intra-node article, just triggered by traffic from outside.
The reply follows the reverse path: Pod B sends to its gateway, Node B forwards to Node A via the /24 route, Node A delivers to Pod A via the /32 route and round trip complete.
Why No SNAT?
We’ve confirmed the IPs stay the same. But usually, traffic leaving a node gets masqueraded, which means, the source IP gets rewritten to the node’s IP.
Why doesn’t that happen here? Let’s check the iptables rules:
minikube -p multi ssh -n multi 'sudo iptables -t nat -L -n -v | grep MASQUERADE'
MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
RETURN all -- * * 0.0.0.0/0 10.244.0.0/16 /* kind-masq-agent: local traffic is not subject to MASQUERADE */
MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kind-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) */
The second line is the key. Before any MASQUERADE rule can apply, there’s a RETURN rule: if the destination is in 10.244.0.0/16, skip masquerading entirely.
Pod-to-pod traffic stays untouched, only traffic leaving the cluster (destined for external IPs) gets SNAT’d.
We can verify this from inside Pod B by starting a capture:
kubectl exec -it net-tools-6bdcf48868-vxql7 -- tcpdump -i eth0 -n icmp
Ping from Pod A:
kubectl exec -it net-tools-6bdcf48868-8lzv7 -- ping -c1 10.244.1.5
Output in Pod B:
IP 10.244.0.4 > 10.244.1.5: ICMP echo request, id 1, seq 0, length 64
IP 10.244.1.5 > 10.244.0.4: ICMP echo reply, id 1, seq 0, length 64
Pod B sees the real source IP.
The Piece People Forget: ARP
Routing tables tell the kernel where to send packets, but routing deals with IP addresses. The actual transmission happens at layer 2, which uses MAC addresses.
So, something has to translate.
That’s ARP.
When the kernel decides to send a packet via 192.168.49.3, it needs Node B’s MAC address.
Let’s check the neighbor table:
minikube -p multi ssh -n multi 'ip neigh show'
192.168.49.3 dev eth0 lladdr 02:42:c0:a8:31:03 REACHABLE
The kernel has cached Node B’s MAC(node/multi-m02). The state is REACHABLE, meaning it’s fresh and valid.
What happens if this entry is missing?
The packet can’t be sent as there’s no MAC address to put in the Ethernet frame.
The kernel will ARP first, wait for a reply, and only then transmit.
Let’s prove it:
minikube -p multi ssh -n multi 'sudo ip neigh flush dev eth0'
Ping again and check:
kubectl exec -it net-tools-6bdcf48868-8lzv7 -- ping -c1 10.244.1.5
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.390/0.390/0.390/0.000 ms
minikube -p multi ssh -n multi 'ip neigh show 192.168.49.3'
192.168.49.3 dev eth0 lladdr 02:42:c0:a8:31:03 REACHABLE
The entry is back.
The kernel ARPed for the MAC before sending the ICMP packet.
But does ARP really matter that much? We can prove it.
Replace the real MAC with a fake one and watch what happens:
docker exec multi sudo ip neigh replace 192.168.49.3 lladdr 00:00:00:00:00:01 dev eth0 nud permanent
kubectl exec net-tools-6bdcf48868-8lzv7 -- ping -c2 -W1 10.244.1.5
2 packets transmitted, 0 received, 100% packet loss
The routing table didn’t change. The route still points to 192.168.49.3. But the packet never arrives because it’s being sent to a MAC address that doesn’t exist on the network.
Restore the correct MAC:
docker exec multi sudo ip neigh replace 192.168.49.3 lladdr 02:42:c0:a8:31:03 dev eth0 nud reachable
kubectl exec net-tools-6bdcf48868-8lzv7 -- ping -c2 10.244.1.5
2 packets transmitted, 2 received, 0% packet loss
Same route, different MAC, completely different outcome.
Routing says “where to go” but ARP says “how to get there.”
Key Take-aways
Kindnet uses plain IP routing, not overlays. Packets travel between nodes without VXLAN, IPIP, or any encapsulation.
The routing tables do all the work: /32 routes for local pods, /24 routes pointing to other nodes for remote subnets.
Pod IPs remain unchanged across the entire path. The “via” directive in routes only affects the Ethernet frame’s destination MAC, not the IP headers.
SNAT is explicitly skipped for pod CIDR traffic via an iptables RETURN rule.
ARP resolution is required before any packet can leave the node. We proved it: same route, fake MAC, 100% packet loss. Routing tables alone aren’t enough.
Next: Services and kube-proxy
We’ve traced pod-to-pod traffic in both directions: intra-node and cross-node. The next layer is Services.
When you curl a ClusterIP, you’re not hitting a pod directly. kube-proxy rewrites the destination IP to one of the backend pods using DNAT. That’s where the iptables chains get interesting.
Previously: Inside Intra-Node Pod Traffic in Kubernetes: How Kindnet with PTP Moves Packets where we traced packets between pods on the same node.
Up next: Inside Kubernetes Services: ClusterIP, iptables, and kube-proxy DNAT where we’ll trace traffic from pod to ClusterIP, watch the DNAT transformation in iptables chains, and follow the packet to the backend pod.
References
- https://github.com/kubernetes-sigs/kind/tree/main/images/kindnetd
- https://www.tkng.io/cni/kindnet/
- https://kubernetes.io/docs/concepts/cluster-administration/networking/
- https://man7.org/linux/man-pages/man8/ip-route.8.html
- https://man7.org/linux/man-pages/man8/tcpdump.8.html
- https://man7.org/linux/man-pages/man8/iptables.8.html
Till next time..
G.