A Python-base EBPF code generator

This library facilitates the generation of EBPF code. Instead of compiling code, we generate it on-the-fly. This is fully done in Python, without the need of an external compiler. This also allows us to entangle user-space and EPBF-space code within the same program.

The code generator is designed such that the code looks mostly like Pyhton, but it is important to remember that the Python code is not actually doing anything while executed, but only generates code which later will be executed by the kernel.

Getting started

As a simple example for EBPF we write an XDP program which simply counts incoming packages.

We start with declaring the variables that we want to see both in the XDP program and in user space:

from ebpfcat.arraymap import ArrayMap
from ebpfcat.xdp import XDP, XDPExitCode

class Count(XDP):
    license = "GPL"  # the Linux kernel wants to know that...

    userspace = ArrayMap()
    count = userspace.globalVar()  # declare a variable in the map

Next comes the program that we want to run in the kernel. Note that this program looks as if it was just Python code, but it is not actually. Instead it generates EBPF code that we can later load into the kernel:

def program(self):
    self.count += 1
    self.exit(XDPExitCode.PASS)  # pass packet on to network stack

Now we can attach this program to a network interface. We use asyncio for synchronization:

async def main():
    c = Count()
    await c.attach("eth0")

Once attached, our little program will be executed each time a packet arrives on the interface. We can read the result in a loop:

for i in range(10):
    await sleep(0.1)
    print("packets arrived so far:", c.count)

With xdp.XDP.attach() the program is attached indefinitely on the interface, even beyond the end of the program. Use xdp.XDP.detach() to detach it, or you may use the async contextmanager xdp.XDP.run() to detach automatically, as in:

async with c.run("eth0"):
     await sleep(1)
     print("packets arrived so far:", c.count)

Note that here we access the member variable count from user space. While generating EBPF, the code generator knows it needs to write out commands to access that variable from EBPF, once accessed outside of generation context, we access it from the user side.

Both xdp.XDP.attach() and xdp.XDP.detach() have an additional parameter flags to choose in which mode to attach the program, use xdp.XDPFlags.SKB_MODE (the default) to use the generic kernel driver, or xdp.XDPFlags.DRV_MODE to let the interface device driver run the program.

For reference, this is the full example:

from asyncio import get_event_loop, sleep
from ebpfcat.arraymap import ArrayMap
from ebpfcat.xdp import XDP, XDPExitCode, XDPFlags

class Count(XDP):
    license = "GPL"

    userspace = ArrayMap()
    count = userspace.globalVar()

    def program(self):
        self.count += 1
        self.exit(XDPExitCode.PASS)


async def main():
    c = Count()

    async with c.run("eth0", XDPFlags.DRV_MODE):
        for i in range(10):
            await sleep(0.1)
            print("packets arrived so far:", c.count)


if __name__ == "__main__":
    get_event_loop().run_until_complete(main())

Maps

Maps are used to communicate to the outside world. They look like instance variables. They may be used from within the EBPF program, and once it is loaded also from Python code. It is possible to write out the maps to a bpf file system using :meth:`

There are two flavors: arraymap.ArrayMap and hashmap.HashMap. They have different use cases:

Array Maps

Array maps are share memory between EBPF programs and user space. All programs as well as user space are accessing the memory at the same time, so concurrent access may lead to problems. An exception is the in-place addition operator +=, which works under a lock, but only if the variable is of 4 or 8 bytes size.

Otherwise variables may be declared in all sizes. The declaration is like so:

class MyProgram(EBPF):
    array_map = ArrayMap()
    a_byte_variable = array_map.globalVar("B")
    an_integer_variable = array_map.globalVar("i")

those variables can be accessed both from within the ebpf program, as from outside. Both sides are actually accessing the same memory, so be aware of race conditions.

Hash Maps

all hash map variables have a fixed size of 8 bytes. Accessing them is rather slow, but is done with proper locking: concurrent access is possible. When accessing them from user space, they are read from the kernel each time anew. They are declared as follows:

class MyProgram(EBPF):
    hash_map = HashMap()
    a_variable = hash_map.globalVar()

They are used as normal variables, like in self.a_variable = 5, both in EBPF and from user space once loaded.

Accessing the packet

The entire point of XDP is to react to the arriving network packets. The EBPF program will be checked statically that it can only access the contents of the packet, and not beyond. This means a with statement (acting as an if) needs to be added that checks that the packet is large enough so every packet access will be within the packet. To facilitate this, a special variable packetSize is defined, that when compared to will generate code that the static code checker understands, like so:

with self.packetSize > 100 as p:  # assure packet has at least 100 bytes
    self.some_variable = p.pH[22]  # read word at position 22

in this code, the variable p returned by the with statement also allows to access the content of the packet. There are six access modes to access different sizes in the packet, whose naming follows the Python struct module, indicated by the letters “BHIQiq”.

Knowing this, we can modify the above example code to only count IP packets:

def program(self):
    with self.packetSize > 16 as p:
        # position 12 is the EtherType
        # 8 is the EtherType for IP, in network byte order
        with p.pH[12] == 8:
            self.count += 1
    self.exit(XDPExitCode.PASS)

as a simplification, if the class attribute minimumPacketSize is set, the program is called within a with statement like above, and all the packet variables appear as variables of the object. The class attribute defaultExitCode then gives the exit code in case the packet is too small (by default XDPExitCode.PASS). So the above example becomes:

class Program(XDP):
    minimumPacketSize = 16
    userspace = ArrayMap()
    count = userspace.globalVar()

    def program(self):
        with self.pH[12] == 8:
            self.count += 1

With the xdp.PacketVar` descriptor it is possible to declare certain positions in the packet as variables. As parameters it takes the position within the packet, and the data format, following the conventions from the Python struct package, including the endianness markers <>!. So the above example simplifies to:

class Program(XDP):
    minimumPacketSize = 16
    userspace = ArrayMap()
    count = userspace.globalVar()
    etherType = PacketVar(12, "!H")  # use network byte order

    def program(self):
        with self.etherType == 0x800:
            self.count += 1

Programming

The actual XDP program is a class that inherits from xdp.XDP. The class body contains all variable declarations, and a method program which is the program proper. It is executed by Python, and while executing an EPBF program is created, which can then be loaded into the linux kernel.

Expressions

Once a variable is declared, it can be used very close to normal Python syntax. Standard arithmetic works, like self.distance = self.speed * self.time, given that all are declared variables. Note that you cannot use usual Python variables, as accessing them does not generate any EBPF code. Use local variables for that.

Local variables

local variables are seen only by one EBPF program, they cannot be seen by other programs or user space. They are declared in the class body like this:

class Program(XDP):
    local_variable = LocalVar("I")

Conditional statements

During code generation, all code needs to be executed. This means that we cannot use a Python if statement, as then the code actually does not get executed, so no code would be generated. So we replace if statements by Python with statements like so:

with self.some_variable > 6 as Else:
    do_someting
with Else:
    do_something_else

certainly an Else statement may be omitted if not needed.

No loops

There is no way to declare a loop, simply because EBPF does not allow it. You may simply write a for loop in Python as long as everything can be calculated at generation time, but this just means that the code will show up in the EPBF as often as the loop is iterated at generation time.

Fixed-point arithmetic

as a bonus beyond standard ebpf, we support fixed-point values as a type x. Within ebpf they are calculated as per-10000, so a 0.2 is represented as 20000. From outside, the variables seem to be doubles. Vaguely following Python, all true divisions / result in a fixed-point result, while all floor divisions // result in a standard integer. Some examples:

class FixedPoint(EPBF):
    array_map = ArrayMap()
    fixed_var = array_map.globalVar("x")  # declare a fixed-point variable
    normal_var = array_map.globalVar("i")

    def program(self):
        self.fixed_var = 3.5  # automatically converted to fixed
        self.normal_var = self.fixed_var  # automatically truncated
        self.fixed_var = self.normal_var / 5  # keep decimals
        self.fixed_var = self.normal_var // 5  # floor division

Reference Documentation

The ebpf module contains the core ebpf code generation

class ebpfcat.ebpf.EBPF(prog_type=0, license=None, kern_version=0, name=None, load_maps=None, subprograms=())

The base class for all EBPF programs

Usually this class is sub-classed, and the actual program is defined in the overwritten program method. Then the program may be loaded into the kernel. Alternatively, this class may even be instantiated directly, in which case you can just issue the program before it is loaded.

After a program is loaded, its maps may be written to a bpf file system using pin_maps(). Those maps may be used at a later time, especially also in a different task, if the parameter load_maps is given, in which case we assume the program has already been loaded.

Parameters:

load_maps – a prefix to load pinned maps from. Must be existing in a bpf file system, and usually ends in a “/”.

assemble()

return the assembled program

call(no)

call the kernel function no from enum FuncId

exit(no=None)

Exit the program with return value no

get_fd(fd)

return the file descriptor fd of a map

jump()

unconditionally jump to a later defined target

jumpIf(comp)

jump if comp is true to a later defined target

load(log_level=0, log_size=40960)

load the program into the kernel

pin_maps(path)

pin the maps of this program to files with prefix path

This path must be in a bpf file system, and all parent directories must already exist, while the individual files must not exist.

program()

overwrite this method with your program while subclassing

class ebpfcat.ebpf.LocalVar(fmt='I')

variables on the stack

class ebpfcat.ebpf.ktime(ebpf)

a function that returns the current ktime in ns

calculate(dst, long, force=False)

issue the code that calculates the value of this expression

this method returns three values:

  • the number of the register with the result

  • a boolean indicating whether this is a 64 bit value

this method is a contextmanager to be used in a with statement. At the end of the with block the result is freed again, i.e. the register will not be reserved for the result anymore.

the default implementation calls get_address for values which actually are in memory and moves that into a register.

Parameters:
  • dst – the number of the register to put the result in, or None if that does not matter.

  • long – True if the result is supposed to be 64 bit. None if it does not matter.

  • force – if true, dst must be respected, otherwise this is optional.

class ebpfcat.ebpf.prandom(ebpf)

a function that returns the current ktime in ns

calculate(dst, long, force=False)

issue the code that calculates the value of this expression

this method returns three values:

  • the number of the register with the result

  • a boolean indicating whether this is a 64 bit value

this method is a contextmanager to be used in a with statement. At the end of the with block the result is freed again, i.e. the register will not be reserved for the result anymore.

the default implementation calls get_address for values which actually are in memory and moves that into a register.

Parameters:
  • dst – the number of the register to put the result in, or None if that does not matter.

  • long – True if the result is supposed to be 64 bit. None if it does not matter.

  • force – if true, dst must be respected, otherwise this is optional.

support for XDP programs

class ebpfcat.xdp.PacketVar(address, fmt)

descriptor to access packet data from an XDP program

Declare packet variables as such:

class Program(XDP):
    etherType = PacketVar(12, "!H")
Parameters:
  • address – the start address within the packet

  • fmt – the data type of the variable, following the conventions from the :module:`struct` module.

class ebpfcat.xdp.XDP(**kwargs)

the base class for XDP programs

XDP programs inherit from this class and define a program() which contains the actual EBPF program. In the class body, variables are declared using ebpf.LocalVar, PacketVar and arraymap.ArrayMap.

minimumPacketSize

set this to an integer value to declare the minimum size of a packet. You will only be able to access that many bytes in the packet. If you need something dynamic, use :var:`packetSize` instead.

defaultExitCode

The default exit code should the packet be smaller than minimumPacketSize. Defaults to XDPExitCode.PASS.

packetSize

compare this value to a number in your program to allow at least that many bytes being read. As an example, to assure at least 20 bytes may be read one would write:

with self.packetSize > 20:
    pass
async attach(network, flags=XDPFlags.SKB_MODE)

attach this program to a network

Parameters:
  • network – the name of the network interface, like "eth0"

  • flags – one of the XDPFlags

async detach(network, flags=XDPFlags.SKB_MODE)

detach this program from a network

Parameters:
  • network – the name of the network interface, like "eth0"

  • flags – one of the XDPFlags

program()

overwrite this method with your program while subclassing

run(network, flags=XDPFlags.SKB_MODE)

attach this program to a network during context

attach this program to the network while the context manager is running, and detach it afterwards.

Parameters:
  • network – the name of the network interface, like "eth0"

  • flags – one of the XDPFlags

class ebpfcat.xdp.XDPExitCode(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
class ebpfcat.xdp.XDPFlags(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

The arraymap module defines array maps, usually used for global variables in EBPF programs

class ebpfcat.arraymap.ArrayMap

A descriptor for an array map

init(ebpf, fd)

create the map and initialize its values