Cross-Node Pod Communication in Kubernetes: Plain IP Routing with Kindnet

Cross-Node Pod Communication in Kubernetes: Plain IP Routing with Kindnet

Most CNI plugins wrap your packets in VXLAN or some other overlay. Kindnet doesn’t. When a pod on one node pings a pod on another node, the packet travels as plain IP.

This raises a question: if there’s no overlay, how does the packet actually get from Node A to Node B? The destination IP is a pod address like 10.244.1.5, not a node address. The underlying network doesn’t know anything about pod CIDRs and something has to bridge that gap.

In this article we’ll capture packets at every hop, inspect the routing decisions, and prove that pod IPs stay intact the entire way. By the end, you’ll understand how Kindnet moves traffic across nodes without touching the IP headers.

In the previous article, we traced intra-node pod traffic through veth pairs and /32 host routes. Now let’s cross node boundaries.

No Overlay: What Kindnet Actually Configures

Before tracing packets, we need to confirm what Kindnet is actually doing. Is it secretly running VXLAN? Let’s check the CNI config:

minikube -p multi ssh -n multi 'cat /etc/cni/net.d/10-kindnet.conflist'

{
  "cniVersion": "0.3.1",
  "name": "kindnet",
  "plugins": [
    {
      "type": "ptp",
      "ipMasq": false,
      "ipam": {
        "type": "host-local",
        "routes": [{ "dst": "0.0.0.0/0" }],
        "ranges": [[{ "subnet": "10.244.0.0/24" }]]
      }
    }
  ]
}

The plugin type is ptp—point-to-point. This creates the veth pairs we examined in the pod birth article. Notice what’s missing: no vxlan, no ipip, no overlay plugin. Packets leave the node as regular IP.

So if there’s no tunnel wrapping these packets, the nodes must be routing them directly. That means the routing tables are doing all the work.

The Setup: Two Nodes, Two Pod CIDRs

We’re running a two-node Minikube cluster. Pod A lives on multi, Pod B lives on multi-m02:

kubectl get pods -o wide

NAME                         READY   STATUS    RESTARTS   AGE     IP           NODE        
net-tools-6bdcf48868-8lzv7   1/1     Running   0          31s     10.244.0.4   multi       
net-tools-6bdcf48868-vxql7   1/1     Running   0          3m42s   10.244.1.5   multi-m02   

kubectl get nodes -o wide

NAME        STATUS   ROLES           AGE   VERSION   INTERNAL-IP
multi       Ready    control-plane   5m    v1.34.0   192.168.49.2
multi-m02   Ready    <none>          5m    v1.34.0   192.168.49.3

Pod A has IP 10.244.0.4, Pod B has IP 10.244.1.5 and they are on different subnets: 10.244.0.0/24 vs 10.244.1.0/24. The nodes themselves are at 192.168.49.2 and 192.168.49.3.

The question is: how does a packet destined for 10.244.1.5 find its way from multi to multi-m02?

The answer is in the routing tables.

On multi:

minikube -p multi ssh -n multi 'ip route | grep 10.244'

10.244.0.2 dev veth83ba7a43 scope host
10.244.0.4 dev vethf837b274 scope host
10.244.1.0/24 via 192.168.49.3 dev eth0

The first two lines are /32 routes for local pods, we covered these in the intra-node article. The third line is new: to reach anything in 10.244.1.0/24, forward to 192.168.49.3 via eth0. Yes, that’s Node B’s IP.

On multi-m02:

minikube -p multi ssh -n multi-m02 'ip route | grep 10.244'

10.244.0.0/24 via 192.168.49.2 dev eth0
10.244.1.5 dev vetha7974668 scope host

To reach 10.244.0.0/24, forward to 192.168.49.2, so the /32 route handles local delivery to Pod B.

Each node knows how to reach the other node’s pod subnet by forwarding to that node’s IP.

Now we’ll need to watch real packets.

Tracing the Packet: Three Capture Points

We’ll set up tcpdump at three locations: Pod A’s veth on the source node, eth0 on the destination node, and Pod B’s veth. If the packet’s source IP changes anywhere along the way, we’ll see it.

First, identify Pod A’s veth:

minikube -p multi ssh -n multi 'ip route | grep 10.244.0.4'

10.244.0.4 dev vethf837b274 scope host

Start capturing on that interface:

minikube -p multi ssh -n multi 'sudo tcpdump -i vethf837b274 -n icmp'
listening on vethf837b274, link-type EN10MB (Ethernet), snapshot length 262144 bytes

In a second terminal, capture on multi-m02’s eth0:

minikube -p multi ssh -n multi-m02 'sudo tcpdump -i eth0 -n icmp and host 10.244.1.5'
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes

In a third terminal, identify Pod B’s veth and capture there:

minikube -p multi ssh -n multi-m02 'ip route | grep 10.244.1.5'

10.244.1.5 dev vetha7974668 scope host

minikube -p multi ssh -n multi-m02 'sudo tcpdump -i vetha7974668 -n icmp'
listening on vetha7974668, link-type EN10MB (Ethernet), snapshot length 262144 bytes

Now trigger the ping:

kubectl exec -it net-tools-6bdcf48868-8lzv7 -- ping -c1 10.244.1.5

PING 10.244.1.5 (10.244.1.5): 56 data bytes
64 bytes from 10.244.1.5: seq=0 ttl=62 time=0.234 ms

Check the captures starting with Pod A’s veth:

IP 10.244.0.4 > 10.244.1.5: ICMP echo request
IP 10.244.1.5 > 10.244.0.4: ICMP echo reply

On multi-m02’s eth0:

IP 10.244.0.4 > 10.244.1.5: ICMP echo request
IP 10.244.1.5 > 10.244.0.4: ICMP echo reply

On Pod B’s veth:

IP 10.244.0.4 > 10.244.1.5: ICMP echo request
IP 10.244.1.5 > 10.244.0.4: ICMP echo reply

We see same IPs at every hop. The packet crossed two nodes and nobody rewrote anything.

But this raises another question: the pod doesn’t know anything about nodes or routing tables.

How did it decide where to send the packet in the first place?

Inside the Pod: The Default Gateway

Pod A has no route for 10.244.1.0/24. All it knows is: anything not local goes to the default gateway at 10.244.0.1. The pod hands off the packet and its job is done:

kubectl exec -it net-tools-6bdcf48868-8lzv7 -- ip route

default via 10.244.0.1 dev eth0
10.244.0.0/24 via 10.244.0.1 dev eth0 src 10.244.0.4
10.244.0.1 dev eth0 scope link src 10.244.0.4

That gateway address points to the host side of the veth pair. Once the packet crosses that boundary, the host kernel takes over.

The packet is now in the host’s network stack. What does the kernel do with a packet destined for 10.244.1.5?

The Source Node’s Routing Decision

The kernel matches the /24 route and decides: send this out eth0, next hop 192.168.49.3:

minikube -p multi ssh -n multi 'ip route get 10.244.1.5'

10.244.1.5 via 192.168.49.3 dev eth0 src 192.168.49.2 uid 0
    cache

This is where we usually get confused.

The output says via 192.168.49.3, but does that mean the destination IP gets rewritten to 192.168.49.3?

No. The via only affects layer 2.

It tells the kernel which MAC address to put in the Ethernet frame header. The IP packet inside that frame still shows 10.244.0.4 → 10.244.1.5. The node IP never appears in the IP headers. So, it’s purely an Ethernet-level decision.

The packet leaves eth0 wrapped in a frame addressed to Node B’s MAC. Inside that frame, the original pod IPs remain untouched.

The Destination Node’s Routing Decision

The packet arrives at multi-m02’s eth0. The kernel strips the Ethernet header and looks at the IP destination: 10.244.1.5. Route lookup:

minikube -p multi ssh -n multi-m02 'ip route get 10.244.1.5'

10.244.1.5 dev vetha7974668 src 10.244.1.1 uid 0
    cache

The /32 route matches. The kernel delivers the packet directly to Pod B’s veth.

Same mechanism we saw in the intra-node article, just triggered by traffic from outside.

The reply follows the reverse path: Pod B sends to its gateway, Node B forwards to Node A via the /24 route, Node A delivers to Pod A via the /32 route and round trip complete.

Why No SNAT?

We’ve confirmed the IPs stay the same. But usually, traffic leaving a node gets masqueraded, which means, the source IP gets rewritten to the node’s IP.

Why doesn’t that happen here? Let’s check the iptables rules:

minikube -p multi ssh -n multi 'sudo iptables -t nat -L -n -v | grep MASQUERADE'

MASQUERADE  all  --  *     !docker0  172.17.0.0/16  0.0.0.0/0
RETURN      all  --  *     *         0.0.0.0/0      10.244.0.0/16  /* kind-masq-agent: local traffic is not subject to MASQUERADE */
MASQUERADE  all  --  *     *         0.0.0.0/0      0.0.0.0/0      /* kind-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) */

The second line is the key. Before any MASQUERADE rule can apply, there’s a RETURN rule: if the destination is in 10.244.0.0/16, skip masquerading entirely.

Pod-to-pod traffic stays untouched, only traffic leaving the cluster (destined for external IPs) gets SNAT’d.

We can verify this from inside Pod B by starting a capture:

kubectl exec -it net-tools-6bdcf48868-vxql7 -- tcpdump -i eth0 -n icmp

Ping from Pod A:

kubectl exec -it net-tools-6bdcf48868-8lzv7 -- ping -c1 10.244.1.5

Output in Pod B:

IP 10.244.0.4 > 10.244.1.5: ICMP echo request, id 1, seq 0, length 64
IP 10.244.1.5 > 10.244.0.4: ICMP echo reply, id 1, seq 0, length 64

Pod B sees the real source IP.

The Piece People Forget: ARP

Routing tables tell the kernel where to send packets, but routing deals with IP addresses. The actual transmission happens at layer 2, which uses MAC addresses.

So, something has to translate.

That’s ARP.

When the kernel decides to send a packet via 192.168.49.3, it needs Node B’s MAC address.

Let’s check the neighbor table:

minikube -p multi ssh -n multi 'ip neigh show'

192.168.49.3 dev eth0 lladdr 02:42:c0:a8:31:03 REACHABLE

The kernel has cached Node B’s MAC(node/multi-m02). The state is REACHABLE, meaning it’s fresh and valid.

What happens if this entry is missing?

The packet can’t be sent as there’s no MAC address to put in the Ethernet frame.

The kernel will ARP first, wait for a reply, and only then transmit.

Let’s prove it:

minikube -p multi ssh -n multi 'sudo ip neigh flush dev eth0'

Ping again and check:

kubectl exec -it net-tools-6bdcf48868-8lzv7 -- ping -c1 10.244.1.5
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.390/0.390/0.390/0.000 ms

minikube -p multi ssh -n multi 'ip neigh show 192.168.49.3'

192.168.49.3 dev eth0 lladdr 02:42:c0:a8:31:03 REACHABLE

The entry is back.

The kernel ARPed for the MAC before sending the ICMP packet.

But does ARP really matter that much? We can prove it.

Replace the real MAC with a fake one and watch what happens:

docker exec multi sudo ip neigh replace 192.168.49.3 lladdr 00:00:00:00:00:01 dev eth0 nud permanent

kubectl exec net-tools-6bdcf48868-8lzv7 -- ping -c2 -W1 10.244.1.5

2 packets transmitted, 0 received, 100% packet loss

The routing table didn’t change. The route still points to 192.168.49.3. But the packet never arrives because it’s being sent to a MAC address that doesn’t exist on the network.

Restore the correct MAC:

docker exec multi sudo ip neigh replace 192.168.49.3 lladdr 02:42:c0:a8:31:03 dev eth0 nud reachable

kubectl exec net-tools-6bdcf48868-8lzv7 -- ping -c2 10.244.1.5

2 packets transmitted, 2 received, 0% packet loss

Same route, different MAC, completely different outcome.

Routing says “where to go” but ARP says “how to get there.”

Key Take-aways

Kindnet uses plain IP routing, not overlays. Packets travel between nodes without VXLAN, IPIP, or any encapsulation.

The routing tables do all the work: /32 routes for local pods, /24 routes pointing to other nodes for remote subnets.

Pod IPs remain unchanged across the entire path. The “via” directive in routes only affects the Ethernet frame’s destination MAC, not the IP headers.

SNAT is explicitly skipped for pod CIDR traffic via an iptables RETURN rule.

ARP resolution is required before any packet can leave the node. We proved it: same route, fake MAC, 100% packet loss. Routing tables alone aren’t enough.

Next: Services and kube-proxy

We’ve traced pod-to-pod traffic in both directions: intra-node and cross-node. The next layer is Services.

When you curl a ClusterIP, you’re not hitting a pod directly. kube-proxy rewrites the destination IP to one of the backend pods using DNAT. That’s where the iptables chains get interesting.

Previously: Inside Intra-Node Pod Traffic in Kubernetes: How Kindnet with PTP Moves Packets where we traced packets between pods on the same node.

Up next: Inside Kubernetes Services: ClusterIP, iptables, and kube-proxy DNAT where we’ll trace traffic from pod to ClusterIP, watch the DNAT transformation in iptables chains, and follow the packet to the backend pod.

References

Till next time..

G.