Userland NIC driver design on Linux

Guillaume Pagnoux

2019-03-12 Tue

Introduction

  • I wanted to learn more about networking
  • So I tried to create a switch

What's a NIC driver ?

  • Tom talked about it last time.
  • We can divide it in two parts:
    • The NIC driver itself (hardware specific)
    • Some abstraction interface (for the user to use)

We generally find two kinds of implementations:

  • Userland interface backed by a kernel module
    • Netmap
    • XDP
  • Full userland implementation
    • DPDK
    • Snabb
    • Ixy

Implementation details

Sending commands to the device

The kernel exposes pci devices in the sysfs:

  • /sys/bus/pci/devices/<domain_nb>:<pci_bus>:<pci_slot>
  • Divided in multiple files
file function
config PCI config space
enable Whether the device is enabled
resource PCI resource host addresses
resource0..N PCI resource N, if present
resource0_wc..N_wc PCI WC map resource N, if prefetchable

Device detection

  • /sys/bus/pci/devices/<device_addr>/config
  • Class id 2
  • Check vendor_id and device_id
int config = pci_open_resource(pci_addr, "config");
uint16_t vendor_id = read_io16(config, 0);
uint16_t device_id = read_io16(config, 2);
uint32_t class_id = read_io32(config, 8) >> 24;
close(config);

if (class_id != 2) {
    error("Device %s is not a NIC", pci_addr);
}

if (vendor_id == 0x1af4 && device_id >= 0x1000) {
    return virtio_init(pci_addr, rx_queues, tx_queues);
} else {
    // Our best guess is to try ixgbe
    return ixgbe_init(pci_addr, rx_queues, tx_queues);
}

ixgbe & virtIO

  • ixgbe uses memIO
  • virtIO as three modes: transitionnal, modern, and legacy
  • ixy only support legacy virtIO

memIO

  • Simple. Just mmap(2):
uint8_t* pci_map_resource(const char* pci_addr) {
    char path[PATH_MAX];
    snprintf(path, PATH_MAX, "/sys/bus/pci/devices/%s/resource0",
             pci_addr);
    remove_driver(pci_addr);
    enable_dma(pci_addr);
    int fd = check_err(open(path, O_RDWR), "open pci resource");
    struct stat stat;
    check_err(fstat(fd, &stat), "stat pci resource");
    return (uint8_t*) check_err(mmap(NULL, stat.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0), "mmap pci resource");
}

portIO

  • ixy only support legacy virtIO NICs.
  • This means we need portIO.

Actually, it's almost the same thing!

int pci_open_resource(const char* pci_addr, const char* resource) {
    char path[PATH_MAX];
    snprintf(path, PATH_MAX, "/sys/bus/pci/devices/%s/%s", pci_addr,
             resource);
    debug("Opening PCI resource at %s", path);
    int fd = check_err(open(path, O_RDWR), "open pci resource");
    return fd;
}

DMA in Userland

  • We want to use DMA
  • But we are in userland…with virtual addresses

How can we get physical addresses ?

  • Open /proc/self/pagemap
  • Profit!

https://www.kernel.org/doc/Documentation/vm/pagemap.txt

Not that simple..

  • The kernel can relocate pages
  • It can also swap them on the disk

We can prevent swaping with mlock(2):

mlock(virt_addr, size);

How about page relocation?

Let's use huge pages!

  • Multiple avantages of using them:
    • The kernel won't relocate them
    • Huge pages are contiguous in memory

https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

virtIO driver

virtIO is a bit special..

Let's take a look at ixy's packet buffers:

struct pkt_buf {
    // physical address to pass a buffer to a nic
    uintptr_t buf_addr_phy;
    struct mempool* mempool;
    uint32_t mempool_idx;
    uint32_t size;
    uint8_t head_room[SIZE_PKT_BUF_HEADROOM];
    uint8_t data[] __attribute__((aligned(64)));
};

Conclusion

Potential improvements

  • ixy currently only supports polling
  • Interrupts could be used using uio_pci_generic

Questions?