A few months ago, I found myself wondering how a command like ping 1.1.1.1
works from within a private network.
In most private networks, multiple hosts connect to the Internet through a router. For IPv4, the router performs network address translation (NAT) by rewriting the original host’s source address to the router’s public IP address. The router can lookup the correct host for a reply packet based on the packet’s port field, at least for protocols like TCP and UDP.
But a command like ping
doesn’t use TCP or UDP; it uses ICMP, and those packets do not have a port field. So how does NAT work for ICMP packets?
This led me down a deep rabbit hole: running experiments in network namespaces, capturing packets, reading RFCs, and tracing through the Linux source code. This post summarizes what I did and learned along the way.1
Before these experiments, I hadn’t spent much time in the Linux networking code – this is something new I’m learning. If I’ve made any mistakes please let me know so I can correct them.
Table of contents
Experiment setup
One of the best ways to understand Linux networking is through experimentation. These days, it’s easy to run experiments using network namespaces to simulate multiple devices on a single Linux machine.
This is the setup I wanted to test:
There are two clients (client1 and client2) connected to a router (natbox) performing NAT from private network 192.168.99.0/24 to public network 10.0.100.0/24. The clients, natbox, and server are each separate network namespaces. Once everything is ready, a ping
from either client to the server at 10.0.100.2
should get a reply!
For these experiments, I used a Fedora 38 Server VM running version 6.2.9 of the Linux kernel. Most of the below commands (ip
, iptables
, tcpdump
, etc.) were run as the root user.2
Step 1: Connect two clients to a bridge
The first step is to create two clients connected to a bridge, like this:
To set it up:
# Create a network namespace for each client.
ip netns add "client1"
ip netns add "client2"
# Create a virtual bridge.
ip link add name "br0" type bridge
ip link set dev "br0" up
# Disable iptables processing for bridges so rules don't block traffic over br0.
sysctl -w net.bridge.bridge-nf-call-iptables=0
# Connect client1 to the bridge with a veth pair and assign IP address 192.168.99.1
ip link add dev "vethclient1" type veth peer name "eth0" netns "client1"
ip link set "vethclient1" master "br0"
ip link set "vethclient1" up
ip -n "client1" addr add dev "eth0" "192.168.99.1/24"
ip -n "client1" link set dev "eth0" up
# Same for client2, with IP address 192.168.99.2
ip link add dev "vethclient2" type veth peer name "eth0" netns "client2"
ip link set "vethclient2" master "br0"
ip link set "vethclient2" up
ip -n "client2" addr add dev "eth0" "192.168.99.2/24"
ip -n "client2" link set dev "eth0" up
If this worked, then:
ip netns
should showclient1
andclient2
.ip -n client1 addr
andip -n client2 addr
should show192.168.99.1
and192.168.99.2
respectively, and theeth0
interface should show “state UP”.
Now the two clients can ping each other over the bridge:
# ping client1 -> client2
ip netns exec client1 ping 192.168.99.2
# ping client2 -> client1
ip netns exec client2 ping 192.168.99.1
Step 2: Connect natbox and server
Next, create network namespaces for the natbox and server:
ip netns add "natbox"
ip netns add "server"
Then connect the natbox to the bridge:
ip link add dev "vethnatbox" type veth peer name "eth0" netns "natbox"
ip link set "vethnatbox" master "br0"
ip link set "vethnatbox" up
ip -n "natbox" addr add dev "eth0" "192.168.99.3/24"
ip -n "natbox" link set dev "eth0" up
The natbox needs a second interface in the 10.0.100.0/24 network, so add that and call it “eth1”. Since there’s only one server, there’s no need for a bridge – just connect the natbox and server directly with a veth pair:
ip -n "natbox" link add "eth1" type veth peer name "eth1" netns "server"
ip -n "natbox" addr add dev "eth1" "10.0.100.1/24"
ip -n "natbox" link set dev "eth1" up
ip -n "server" addr add dev "eth1" "10.0.100.2/24"
ip -n "server" link set dev "eth1" up
Now the natbox can reach both clients and the server. Test it with ping:
# ping natbox -> client1
ip netns exec natbox ping 192.168.99.1
# ping natbox -> client2
ip netns exec natbox ping 192.168.99.2
# ping natbox -> server
ip netns exec natbox ping 10.0.100.2
At this point, every network namespace, interface, and veth pair has been created:
However, the client cannot yet ping the server because the natbox isn’t forwarding traffic between its interfaces or performing NAT.
Step 3: Configure routing and NAT
Add a default route in each client to send traffic to the natbox:
ip -n client1 route add 0.0.0.0/0 via 192.168.99.3
ip -n client2 route add 0.0.0.0/0 via 192.168.99.3
For security reasons, Linux does not forward packets between interfaces unless specifically enabled. So configure the natbox to forward traffic by setting net.ipv4.ip_forward
:
ip netns exec natbox sysctl "net.ipv4.ip_forward=1"
At this point, packets from a client will reach the server. However, these packets will retain the original source IP in the 192.168.99.0/24 network, so replies from the server back to this IP will go… nowhere. Fix it by configuring the natbox to NAT the traffic from a client IP (in network 192.168.99.0/24) to the natbox’s public IP (10.0.100.1/24). The easiest way to do this is to add a MASQUERADE rule to the iptables “nat” chain:
ip netns exec natbox iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
At last, clients can reach the server through the natbox! Test it with ping:
# ping client1 -> server via natbox
ip netns exec natbox ping 10.0.100.2
# ping client2 -> server via natbox
ip netns exec natbox ping 10.0.100.2
Packet capture
Now capture ICMP packets from both client and server network namespaces.
ip netns exec client1 tcpdump -n icmp
ip netns exec server tcpdump -n icmp
This is the tcpdump for client1:
08:01:33.549598 IP 192.168.99.1 > 10.0.100.2: ICMP echo request, id 31428, seq 1, length 64
08:01:33.549661 IP 10.0.100.2 > 192.168.99.1: ICMP echo reply, id 31428, seq 1, length 64
08:01:34.610605 IP 192.168.99.1 > 10.0.100.2: ICMP echo request, id 31428, seq 2, length 64
08:01:34.610654 IP 10.0.100.2 > 192.168.99.1: ICMP echo reply, id 31428, seq 2, length 64
… and the corresponding tcpdump for the server:
08:01:33.549643 IP 10.0.100.1 > 10.0.100.2: ICMP echo request, id 31428, seq 1, length 64
08:01:33.549654 IP 10.0.100.2 > 10.0.100.1: ICMP echo reply, id 31428, seq 1, length 64
08:01:34.446611 IP 10.0.100.1 > 10.0.100.2: ICMP echo request, id 33391, seq 1, length 64
08:01:34.446619 IP 10.0.100.2 > 10.0.100.1: ICMP echo reply, id 33391, seq 1, length 64
08:01:34.610635 IP 10.0.100.1 > 10.0.100.2: ICMP echo request, id 31428, seq 2, length 64
08:01:34.610646 IP 10.0.100.2 > 10.0.100.1: ICMP echo reply, id 31428, seq 2, length 64
08:01:35.506411 IP 10.0.100.1 > 10.0.100.2: ICMP echo request, id 33391, seq 2, length 64
08:01:35.506423 IP 10.0.100.2 > 10.0.100.1: ICMP echo reply, id 33391, seq 2, length 64
These captures show that:
- Traffic is being NAT’d. By the time an ICMP echo request reaches the server (10.0.100.2), its source IP has been rewritten to the IP of the natbox (10.0.100.1).
- Each client has a different “id” field (in the capture above, client1 has ID 31428 and client2 has ID 33391).
The “id” field seemed like it might allow the natbox to distinguish reply packets destined for each client. But what does the “id” field mean, and how is it chosen?
RFC 792
ICMP is a very, very old protocol. It is defined in RFC 792, which was published in 1981. The RFC specifies the exact structure of an ICMP echo and echo reply message:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+-
The “type” field distinguishes an echo request (8) from an echo reply (1). Code is always 0 (I guess it isn’t used for anything?). What abo