Objective¶
The objective for this project was to implement a UDP transmitter for use on the RP2040 that would consume zero CPU time. The system described on this webpage is implemented with 3 PIO state machines, 12 DMA channels, and 1 PWM channel. It generates an NLP, performs packet checksum calculations, and performs packet transactions with no CPU interaction beyond specifying the data which should be transmitted. It uses the unique ID for the flash chip to generate a MAC address for the RP2040.
For applications which don’t require additional DMA channels (since they’re all consumed by the UDP machine described below), the exclusive use of peripherals for UDP transactions should make for particularly simple integration into application code, and maximizes the data rate out of the RP2040. This was constructed as a piece of course infrastructure for ECE 4760 at Cornell, for use in student projects.
TLDR, how do I use it?¶
Please find some demo code here, and see the video below for a demonstration.
- Navigate to
udp_tx_parameters.h
and modify the UDP payload size, ethernet source/destination addresses, IP source/destination addresses, and source/destination ports, as necessary.
- In
main
for your application code, overclock to 240MHz before you do anything else. This is required to get the PIO state machine timing right.
set_sys_clock_khz(240000, true) ;
-
Call
initUDP(unsigned int txminus_in, irq_handler_t handler)
. The first argument specifies the GPIO number which should be associated with the TX- line (the TX+ line will be mapped to this GPIO number, plus one). The second argument is the name of the interrupt service routine which should be called upon transmission completion. -
Call
SEND_PACKET ;
to send a packet! You can modify the UDP data by modifying the values in theudp_payload
array, and then callingSEND_PACKET
again will transmit this new data. Each call toSEND_PACKET
is non-blocking. Transmit complete will be signalled by entering the ISR that you specified in step 3. -
Plug TX+ into pin 1 of an RJ45 connector, and TX- into pin 2. Plug an ethernet cable into this connector, and connect the other end to a device or switch.
-
For testing on your own computer, you can use this Python code (just make sure to change the IP and port number to whatever you specified in step 1, and the argument to sock.recv to whatever you specified as the
DEF_UDP_PAYLOAD_SIZE
in step 1)
import socket sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.bind(('169.254.177.93', 1024)) # (host, port) because AF_INET print("Listening...") while True: print(sock.recv(18)) # buffer size
System overview¶
Though the UDP transmitter described on this webpage supports dynamic modification of the source and destination addresses for transactions, it is optimized for static source address, destination address, and payload size. The user configures these parameters, and the only portion of the packet which is modified by the user application code during runtime is the UDP data (which can be done as quickly as the system can modify the values in an array). The user application code modifies this data, and starts DMA channel 2. This initiates a sequence of DMA events that will:
- Stop the PWM channel, so that no normal link pulses are generated during the transaction.
- Zero the PWM counter, so that it counts back up from zero when re-enabled after the transaction (like a watchdog)
- Initializes the DMA sniffer data register.
- Resets the read pointer for the DMA channel which will send the UDP Ethernet packet to PIO state machine 0.
- Sends the preamble/SFD to PIO state machine 0, which serializes it out to the TX+ and TX- pins.
- Sends the UDP Ethernet packet to PIO state machine 0, which serializes it out to the TX+/TX- pins. Note that the DMA sniffer is attached to this channel! So the Ethernet checksum is automatically computed during this transaction, and stored in the
sniffer data
register. - Moves the checksum (32 bits) to a buffer character array.
- Sends this checksum character array to PIO state machine 0, which serializes it out to PIO state machine 0.
- Sends a delay time to PIO state machine 2, which will wait for the specified amount of time (for the checksum transaction to complete), and then generate the TP_IDL signal. This PIO state machine will also generate an interrupt to signal to the processors that the transaction is complete.
- Re-enables the PWM channel to resume generating the normal link pulse.
This sequence of events is illustrated in the diagram below. Each of the red arrows reads as “chains to.” So, you can follow the sequence of events by following the red arrows. Note that DMA channels 0 and 1, which are responsible for interacting with the PIO state machine which generates the normal link pulse, are independent from the other channels. They only see the DREQ
generated by the PWM channel. So, this PWM channel is the sole mechanism for interaction between the NLP state machine and the UDP ethernet state machines.
The advantage of all this DMA footwork is that the ethernet transactions are completely non-blocking. For long payloads, this is a really nice feature! User application code can generate the next packet (gather sensor data, perform computations, etc.) while the previous packet is being transmitted. This maximizes the data rate out of the RP2040.

Initializing the UDP packet information¶
There’s a lot of information in a UDP packet, most of which does not need to change from one transmission to the next. The parameters that the user will need to modify are consolidated in udp_tx_parameters.h
. The rest of the parameters (which some users may wish to modify for niche applications) are at the very top of udp_tx.h
. During initialization, a helper function uses all these parameters to populate an array which includes the ethernet information, IP information, UDP information, and UDP payload in the correct order. The full packet includes:
- Destination MAC address (user specified) – 6 bytes
- Source MAC address (user specified or generated from flash ID) – 6 bytes
- Ethernet type (set to “IP”) – 2 bytes
- IP version (v4) and header length (5, for 20 bytes, which is 5 32-bit increments) – 1 byte
- IP type of service – 1 byte
- IP total length – 2 bytes
- IP identifier – 2 bytes
- IP flags and fragmentation settings – 2 bytes
- IP time to live – 1 byte
- IP protocol – 1 byte
- IP checksum – 2 bytes
- IP source address (user specified) – 4 bytes
- IP destination address (user specified) – 4 bytes
- UDP source port (user specified) – 2 bytes
- UDP payload port (user specified) – 2 bytes
- UDP payload length (user specified) – 2 bytes
- UDP checksum (unused, set to 0) – 2 bytes
- UDP payload (user specified data and length)
The checksum is 4 bytes, and is computed from the entire packet above. A DMA sniffer is used to compute this checksum at runtime, and without CPU interaction. In the user application code, the UDP payload can be changed and the checksum will automatically be recomputed during transmission for no overhead.
Generating the Normal Link Pulse¶
For a 10BASE-T connection, in the absence of network traffic, a pulse must be sent every 16ms +/- 8ms to keep the link alive. In order to implement this, we want something like a watchdog timer. That is, we want for a timer to count down from 16ms and, in the event that it reaches 0, it should trigger an NLP. However, we should have this timer be reset every time we finish sending a packet.
We will use a PIO state machine to generate these pulses, and the state machine will stall on a pull
command until a DMA channel moves data into its TX FIFO. The watchdog timer peripheral does not have a DREQ
signal visible to the DMA channels, so we’ll use a PWM channel as a watchdog!
PWM-based watchdog¶
The PWM channel is configured with a clock divider of 64 and a wrapval of 60,000. With the system overclocked to 240MHz, this gives a period of 16ms. A DMA channel is configured with DREQ_PWM_WRAP
as its DREQ, and it writes to the TX FIFO of the PIO state machine that generates the NLP. This state machine stalls on an out
command (with autopull enabled) until the DMA channel puts some dummy data into the FIFO. A second DMA channel is configured to re-enable the first.
To stop the NLP for a transaction, another DMA channel need only to disable the PWM channel. As soon as it is re-enabled and the DREQ
starts again, the DMA channel will start triggering NLP’s as before.
// Wrapval and clock div for 16ms PWM period #define WRAPVAL 60000 #define CLKDIV 64 // Slice number chosen arbitrarily int slice_num = 7 ; // Experimentation shows we don't need to map this to a GPIO // or configure a particular duty cycle. Configured for a wraptime // of 16ms (NLP interval) pwm_set_wrap(slice_num, WRAPVAL) ; pwm_set_clkdiv(slice_num, CLKDIV) ; pwm_set_enabled(slice_num, true) ; /////////////////////////////////////////////////////////////////// ////////////////////////// DMA NLP SETUP ////////////////////////// /////////////////////////////////////////////////////////////////// // Triggers the NLP machine, started by PWM watchdog channel dma_channel_config c0 = dma_channel_get_default_config(chan_0); // default configs channel_config_set_transfer_data_size(&c0, DMA_SIZE_32); // 32-bit txfers channel_config_set_read_increment(&c0, false); // no read incrementing channel_config_set_write_increment(&c0, false); // no write incrementing channel_config_set_dreq(&c0, DREQ_PWM_WRAP7) ; // DREQ_PWM_WRAP7 pacing channel_config_set_chain_to(&c0, chan_1); // chain to chan 1 dma_channel_configure( chan_0, // Channel to be configured &c0, // The configuration we just created &pio->txf[sm_nlp], // write address (NLP PIO TX FIFO) &nlp_dummy, // The initial read address (dummy value) 1, // Number of transfers; in this case each is 4 byte. false // Don't start immediately. ); // Channel One (resets NLP pulse machine) dma_channel_config c1 = dma_channel_get_default_config(chan_1); // default configs channel_config_set_transfer_data_size(&c1, DMA_SIZE_32); // 32-bit txfers channel_config_set_read_increment(&c1, false); // no read incrementing channel_config_set_write_increment(&c1, false); // no write incrementing channel_config_set_chain_to(&c1, chan_0); // chain back to chan 0 dma_channel_configure( chan_1, // Channel to be configured &c1, // The configuration we just created &dummy_dest, // write address (dummy) &dummy_source, // The initial read address (dummy) 1, // Number of transfers; in this case each is 4 byte. false // Don't start immediately. );
PIO state machine¶
The NLP PIO state machine simply stalls on an out
command, then sets the TX+/TX- pins for 100ns, and puts the lines back to idle. irq 0
is used to prevent the packet serializing state machine from attempting to take control of the data lines in the middle of a normal link pulse.
out x, 32 ; 32 bits from OSR to x scratch (autopull enabled, stalls here)
irq 0 ; Assert interrupt 0
set pins, 2 [5] ; Pulse for 100 ns
set pins, 0 [5] ; End pulse (both lines idle)
irq clear 0 ; Clear interrupt 0
Transmitting the UDP packet¶
When the user initiates a UDP transfer, a sequence of DMA events occur which move the preamble, SFD, ethernet information, IP information, UDP information, UDP data, and ethernet checksum from memory to a PIO state machine which manchester encodes each bit and puts it out onto the TX+ and TX- pins. All of this happens separately from the ARM processors (i.e., it’s non-blocking) so that the user’s application code can start computing the next packet while the previou