Writing your own TCP/IP stack may seem like a daunting task. Indeed, TCP has accumulated many specifications over its lifetime of more than thirty years. The core specification, however, is seemingly compact1 – the important parts being TCP header parsing, the state machine, congestion control and retransmission timeout computation.
The most common layer 2 and layer 3 protocols, Ethernet and IP respectively, pale in comparison to TCP’s complexity. In this blog series, we will implement a minimal userspace TCP/IP stack for Linux.
The purpose of these posts and the resulting software is purely educational – to learn network and system programming at a deeper level.
- TUN/TAP devices
- Ethernet Frame Format
- Ethernet Frame Parsing
- Address Resolution Protocol
- Address Resolution Algorithm
- Conclusion
- Sources
To intercept low-level network traffic from the Linux kernel, we will use a Linux TAP device. In short, a TUN/TAP device is often used by networking userspace applications to manipulate L3/L2 traffic, respectively. A popular example is tunneling, where a packet is wrapped inside the payload of another packet.
The advantage of TUN/TAP devices is that they’re easy to set up in a userspace program and they are already being used in a multitude of programs, such as OpenVPN.
As we want to build the networking stack from the layer 2 up, we need a TAP device. We instantiate it like so:
/*
* Taken from Kernel Documentation/networking/tuntap.txt
*/
int tun_alloc(char *dev)
{
struct ifreq ifr;
int fd, err;
if( (fd = open("/dev/net/tap", O_RDWR)) < 0 ) {
6 Comments
revskill
I appreciate the non assumption explanation in the article. Well done.
dang
Related:
Let's code a TCP/IP stack (2016) – https://news.ycombinator.com/item?id=27654182 – June 2021 (49 comments)
Let's code a TCP/IP stack, 1: Ethernet & ARP (2016) – https://news.ycombinator.com/item?id=17316487 – June 2018 (47 comments)
Let's Code a TCP/IP Stack: TCP Retransmission – https://news.ycombinator.com/item?id=14701199 – July 2017 (30 comments)
Let's code a TCP/IP stack, 1: Ethernet and ARP – https://news.ycombinator.com/item?id=11234229 – March 2016 (49 comments)
globular-toast
I did a similar thing in Python[0]. Probably not as well written and, to be honest, I just made up the address resolution algorithm. I got as far as pinging an internet host with ICMP. I like that mine is completely contained in a (short) notebook, though (the OP article misses many details that are in the larger source code that is referenced).
I hadn't seen this article and did mine all from Wikipedia! There is a huge jump in complexity for TCP, though, and I lost interest a bit. Part 3 of this covers that so maybe one day I'll read that and finish mine.
I found it very rewarding and it's definitely something that is doable by any level of programmer if you're interested in networking.
[0] https://github.com/georgek/notebooks/blob/master/internet.ip…
p4bl0
I don't get where the author get the 10.0.0.4 IP address from, the one used to test ARP resolution. What is it supposed to be the address of? A fake device accessible to the made up Ethernet device programed here? Or is it an actual device on the author network?
Can someone explain that?
kbouck
If you disable ARP, you can have a group of servers on the same network configured with the same IP! and if a server acting as a routing frontend can forward packets to a backend server's network interface by mac address (need a kernel extension for this trickery), that backend server will recognize itself as the destination, swap the source/dest IP and respond directly back to the client (without going back through the routing frontend)
Alternatively, you can accomplish the same without disabling ARP and by just adding the common IP address as an alias to the loopback interface, which allows the backend to recognize itself as the destination, but avoids ARP conflicts.
This was a trick used by IBM's WebSphere software load balancer back in the 90's-00's
zoobab
If you compile a minimal linux kernel without a tcp/ip stack -> 400KB.
If you add a tcp/ip stack -> 800KB.
For a project where I should just send the temperature, I just made a small C program in userspace that sent the value over a crafted UDP message, saved a lot of space (and complexity) :-).