A Python-base EBPF code generator¶
This library facilitates the generation of EBPF code. Instead of compiling code, we generate it on-the-fly. This is fully done in Python, without the need of an external compiler. This also allows us to entangle user-space and EPBF-space code within the same program.
The code generator is designed such that the code looks mostly like Pyhton, but it is important to remember that the Python code is not actually doing anything while executed, but only generates code which later will be executed by the kernel.
Getting started¶
As a simple example for EBPF we write an XDP program which simply counts incoming packages.
We start with declaring the variables that we want to see both in the XDP program and in user space:
from ebpfcat.arraymap import ArrayMap
from ebpfcat.xdp import XDP, XDPExitCode
class Count(XDP):
license = "GPL" # the Linux kernel wants to know that...
userspace = ArrayMap()
count = userspace.globalVar() # declare a variable in the map
Next comes the program that we want to run in the kernel. Note that this program looks as if it was just Python code, but it is not actually. Instead it generates EBPF code that we can later load into the kernel:
def program(self):
self.count += 1
self.exit(XDPExitCode.PASS) # pass packet on to network stack
Now we can attach this program to a network interface. We use asyncio
for synchronization:
async def main():
c = Count()
await c.attach("eth0")
Once attached, our little program will be executed each time a packet arrives on the interface. We can read the result in a loop:
for i in range(10):
await sleep(0.1)
print("packets arrived so far:", c.count)
With xdp.XDP.attach()
the program is attached indefinitely on the
interface, even beyond the end of the program. Use xdp.XDP.detach()
to
detach it, or you may use the async contextmanager xdp.XDP.run()
to
detach automatically, as in:
async with c.run("eth0"):
await sleep(1)
print("packets arrived so far:", c.count)
Note that here we access the member variable count
from user space.
While generating EBPF, the code generator knows it needs to write out
commands to access that variable from EBPF, once accessed outside of
generation context, we access it from the user side.
Both xdp.XDP.attach()
and xdp.XDP.detach()
have an additional
parameter flags
to choose in which mode to attach the program, use
xdp.XDPFlags.SKB_MODE
(the default) to use the generic kernel driver,
or xdp.XDPFlags.DRV_MODE
to let the interface device driver run the
program.
For reference, this is the full example:
from asyncio import get_event_loop, sleep
from ebpfcat.arraymap import ArrayMap
from ebpfcat.xdp import XDP, XDPExitCode, XDPFlags
class Count(XDP):
license = "GPL"
userspace = ArrayMap()
count = userspace.globalVar()
def program(self):
self.count += 1
self.exit(XDPExitCode.PASS)
async def main():
c = Count()
async with c.run("eth0", XDPFlags.DRV_MODE):
for i in range(10):
await sleep(0.1)
print("packets arrived so far:", c.count)
if __name__ == "__main__":
get_event_loop().run_until_complete(main())
Maps¶
Maps are used to communicate to the outside world. They look like instance variables. They may be used from within the EBPF program, and once it is loaded also from Python code. It is possible to write out the maps to a bpf file system using :meth:`
There are two flavors: arraymap.ArrayMap
and hashmap.HashMap
. They have different use cases:
Array Maps¶
Array maps are share memory between EBPF programs and user space. All programs as well as user space are accessing the memory at the same time, so concurrent access may lead to problems. An exception is the in-place addition operator +=, which works under a lock, but only if the variable is of 4 or 8 bytes size.
Otherwise variables may be declared in all sizes. The declaration is like so:
class MyProgram(EBPF):
array_map = ArrayMap()
a_byte_variable = array_map.globalVar("B")
an_integer_variable = array_map.globalVar("i")
those variables can be accessed both from within the ebpf program, as from outside. Both sides are actually accessing the same memory, so be aware of race conditions.
Hash Maps¶
all hash map variables have a fixed size of 8 bytes. Accessing them is rather slow, but is done with proper locking: concurrent access is possible. When accessing them from user space, they are read from the kernel each time anew. They are declared as follows:
class MyProgram(EBPF):
hash_map = HashMap()
a_variable = hash_map.globalVar()
They are used as normal variables, like in self.a_variable = 5
, both
in EBPF and from user space once loaded.
Accessing the packet¶
The entire point of XDP is to react to the arriving network packets.
The EBPF program will be checked statically that it can only access the
contents of the packet, and not beyond. This means a with
statement
(acting as an if) needs to be added that checks that the packet is large
enough so every packet access will be within the packet. To facilitate this,
a special variable packetSize
is defined, that when compared to will
generate code that the static code checker understands, like so:
with self.packetSize > 100 as p: # assure packet has at least 100 bytes
self.some_variable = p.pH[22] # read word at position 22
in this code, the variable p
returned by the with
statement also
allows to access the content of the packet. There are six access modes
to access different sizes in the packet, whose naming follows the Python
struct
module, indicated by the letters “BHIQiq”.
Knowing this, we can modify the above example code to only count IP packets:
def program(self):
with self.packetSize > 16 as p:
# position 12 is the EtherType
# 8 is the EtherType for IP, in network byte order
with p.pH[12] == 8:
self.count += 1
self.exit(XDPExitCode.PASS)
as a simplification, if the class attribute minimumPacketSize
is set,
the program
is called within a with
statement like above, and all
the packet variables appear as variables of the object. The class
attribute defaultExitCode
then gives the exit code in case the packet
is too small (by default XDPExitCode.PASS
). So the above example becomes:
class Program(XDP):
minimumPacketSize = 16
userspace = ArrayMap()
count = userspace.globalVar()
def program(self):
with self.pH[12] == 8:
self.count += 1
With the xdp.PacketVar`
descriptor it is possible to declare certain
positions in the packet as variables. As parameters it takes the position
within the packet, and the data format, following the conventions from the
Python struct
package, including the endianness markers <>!
. So the
above example simplifies to:
class Program(XDP):
minimumPacketSize = 16
userspace = ArrayMap()
count = userspace.globalVar()
etherType = PacketVar(12, "!H") # use network byte order
def program(self):
with self.etherType == 0x800:
self.count += 1
Programming¶
The actual XDP program is a class that inherits from xdp.XDP
. The
class body contains all variable declarations, and a method program
which
is the program proper. It is executed by Python, and while executing an EPBF
program is created, which can then be loaded into the linux kernel.
Expressions¶
Once a variable is declared, it can be used very close to normal Python syntax.
Standard arithmetic works, like self.distance = self.speed * self.time
,
given that all are declared variables. Note that you cannot use usual Python
variables, as accessing them does not generate any EBPF code. Use local
variables for that.
Local variables¶
local variables are seen only by one EBPF program, they cannot be seen by other programs or user space. They are declared in the class body like this:
class Program(XDP):
local_variable = LocalVar("I")
Conditional statements¶
During code generation, all code needs to be executed. This means that
we cannot use a Python if
statement, as then the code actually does not
get executed, so no code would be generated. So we replace if
statements
by Python with
statements like so:
with self.some_variable > 6 as Else:
do_someting
with Else:
do_something_else
certainly an Else
statement may be omitted if not needed.
No loops¶
There is no way to declare a loop, simply because EBPF does not allow it.
You may simply write a for
loop in Python as long as everything can
be calculated at generation time, but this just means that the code will show
up in the EPBF as often as the loop is iterated at generation time.
Fixed-point arithmetic¶
as a bonus beyond standard ebpf, we support fixed-point values as a type x
.
Within ebpf they are calculated as per-10000, so a 0.2 is represented as
20000. From outside, the variables seem to be doubles. Vaguely following
Python, all true divisions /
result in a fixed-point result, while all
floor divisions //
result in a standard integer. Some examples:
class FixedPoint(EPBF):
array_map = ArrayMap()
fixed_var = array_map.globalVar("x") # declare a fixed-point variable
normal_var = array_map.globalVar("i")
def program(self):
self.fixed_var = 3.5 # automatically converted to fixed
self.normal_var = self.fixed_var # automatically truncated
self.fixed_var = self.normal_var / 5 # keep decimals
self.fixed_var = self.normal_var // 5 # floor division
Reference Documentation¶
The ebpf
module contains the core ebpf code generation
- class ebpfcat.ebpf.EBPF(prog_type=0, license=None, kern_version=0, name=None, load_maps=None, subprograms=())¶
The base class for all EBPF programs
Usually this class is sub-classed, and the actual program is defined in the overwritten program method. Then the program may be loaded into the kernel. Alternatively, this class may even be instantiated directly, in which case you can just issue the program before it is loaded.
After a program is loaded, its maps may be written to a bpf file system using
pin_maps()
. Those maps may be used at a later time, especially also in a different task, if the parameter load_maps is given, in which case we assume the program has already been loaded.- Parameters:
load_maps – a prefix to load pinned maps from. Must be existing in a bpf file system, and usually ends in a “/”.
- assemble()¶
return the assembled program
- call(no)¶
call the kernel function no from enum FuncId
- exit(no=None)¶
Exit the program with return value no
- get_fd(fd)¶
return the file descriptor fd of a map
- jump()¶
unconditionally jump to a later defined target
- jumpIf(comp)¶
jump if comp is true to a later defined target
- load(log_level=0, log_size=40960)¶
load the program into the kernel
- pin_maps(path)¶
pin the maps of this program to files with prefix path
This path must be in a bpf file system, and all parent directories must already exist, while the individual files must not exist.
- program()¶
overwrite this method with your program while subclassing
- class ebpfcat.ebpf.LocalVar(fmt='I')¶
variables on the stack
- class ebpfcat.ebpf.ktime(ebpf)¶
a function that returns the current ktime in ns
- calculate(dst, long, force=False)¶
issue the code that calculates the value of this expression
this method returns three values:
the number of the register with the result
a boolean indicating whether this is a 64 bit value
this method is a contextmanager to be used in a with statement. At the end of the with block the result is freed again, i.e. the register will not be reserved for the result anymore.
the default implementation calls get_address for values which actually are in memory and moves that into a register.
- Parameters:
dst – the number of the register to put the result in, or None if that does not matter.
long – True if the result is supposed to be 64 bit. None if it does not matter.
force – if true, dst must be respected, otherwise this is optional.
- class ebpfcat.ebpf.prandom(ebpf)¶
a function that returns the current ktime in ns
- calculate(dst, long, force=False)¶
issue the code that calculates the value of this expression
this method returns three values:
the number of the register with the result
a boolean indicating whether this is a 64 bit value
this method is a contextmanager to be used in a with statement. At the end of the with block the result is freed again, i.e. the register will not be reserved for the result anymore.
the default implementation calls get_address for values which actually are in memory and moves that into a register.
- Parameters:
dst – the number of the register to put the result in, or None if that does not matter.
long – True if the result is supposed to be 64 bit. None if it does not matter.
force – if true, dst must be respected, otherwise this is optional.
support for XDP programs
- class ebpfcat.xdp.PacketVar(address, fmt)¶
descriptor to access packet data from an XDP program
Declare packet variables as such:
class Program(XDP): etherType = PacketVar(12, "!H")
- Parameters:
address – the start address within the packet
fmt – the data type of the variable, following the conventions from the :module:`struct` module.
- class ebpfcat.xdp.XDP(**kwargs)¶
the base class for XDP programs
XDP programs inherit from this class and define a
program()
which contains the actual EBPF program. In the class body, variables are declared usingebpf.LocalVar
,PacketVar
andarraymap.ArrayMap
.- minimumPacketSize¶
set this to an integer value to declare the minimum size of a packet. You will only be able to access that many bytes in the packet. If you need something dynamic, use :var:`packetSize` instead.
- defaultExitCode¶
The default exit code should the packet be smaller than
minimumPacketSize
. Defaults toXDPExitCode.PASS
.
- packetSize¶
compare this value to a number in your program to allow at least that many bytes being read. As an example, to assure at least 20 bytes may be read one would write:
with self.packetSize > 20: pass
- async attach(network, flags=XDPFlags.SKB_MODE)¶
attach this program to a
network
- Parameters:
network – the name of the network interface, like
"eth0"
flags – one of the
XDPFlags
- async detach(network, flags=XDPFlags.SKB_MODE)¶
detach this program from a
network
- Parameters:
network – the name of the network interface, like
"eth0"
flags – one of the
XDPFlags
- program()¶
overwrite this method with your program while subclassing
- class ebpfcat.xdp.XDPExitCode(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
- class ebpfcat.xdp.XDPFlags(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
The arraymap
module defines array maps, usually used for global
variables in EBPF programs