Higher order functions and lifetimes
Rust is great to do safe bindings between itself and other languages. It handles C FFI natively, and there are lots of really great runtimes/binding libraries. Two examples I loved playing with:
For the first one the goal was to slowly migrate a big computer vision application from Python to Rust. We started remodeling everything in term of Rust structures and traits. We also added pyo3
bindings to key components for us to still be able to use new code in the old codebase while we refactor it.
For this project, performance was important, so the less you copy stuff from one environment to another, the better you feel. Especially if you handle multiple uncompressed image streams in parallel.
If there is something painful about FFI, it is sharing memory. One day you will forget that a reference to an object on one side will be invalidated, or your will leak memory if you don't handle reference count properly. You could also forget to acquire a lock before accessing the memory ... (It happens all the time, even without FFI, and sometimes it's not even your fault but the lack of documentation. Yes I did a lot of gstreamer too ...).
Ownership to the rescue
To help you with all these footguns, Rust FFI libraries use the same construct as std::sync::Mutex
. Let's see how it helps with Python.
When you want to access a value from Python's environment, you have to acquire a lock called the Global Interpreter Lock (although they try to make it optional since 3.13). This greatly reduces bad race conditions. To make sure we always lock the GIL before handling Python memory, pyo3
only type that can own a Python reference is Py
, and it only allows us to access the pointed value if we provide it with a GIL handle (named Python
).
The only way to get a GIL handle is by using the function Python::with_gil
. Here is its type:
This function will acquire a GIL handle, and pass it to a function you provided. You can then do whatever you want inside the provided function, the GIL is locked.
Note the 'py
lifetime associated with the handle. This will prevent anyone from trying to take it out of the closure (by cloning maybe). You can still clone it but since it cannot live longer than with_gil
scope, using it after will raise a compilation error.
Extracting references
Let's say you have a pipeline consisting of 3 important steps:
Image extraction -> Image computation -> Result analysis
The part that changes the most, is the middle one, because it's the core. This is where we test new algorithms, tweak some values... So of course the new Rust code will mostly be here. There is a problem... We have to be able to both call Rust from Python (easy) in the "Result analysis" part, but also call Python from Rust (harder) in the "Image extraction" part.
Why not just create a Python wrapper that takes extracted images and feed them to a bound Rust function ? Well, because the pipeline can be much more complicated than just streaming images in a simple function and we may want to handle different images at different stages of the processing pipeline. It would lead us to just creating big Rust structures that expose a lot of method and bind everything in Python and double our codebase (because we must implement the pipeline in Python even if it's just calling functions in the right order).
Instead of doing that, we can just make our pipeline generic over the image provider (which is already useful even without the binding problem, since we can have multiple types of providers), and implement this trait with a structure representing a Python-side provider object.
The trait could be written as:
It looks like the std::opt::Index
trait. The good thing here is that we don't have to create a new Image
each time we want one, the provider owns it and gives us a view on it, so we don't have to pollute the pipeline with caching logic (if we want it). The image will only be cloned if absolutely needed.
Let's try to implement with our Python provider. I said earlier that you can only have references to Python objects with Py
pointer type (in details, we will use a Py<PyAny>
, this is a pointer to a Python object no matter what's its real type). So the code should look like this:
See like everytime we try to use the object, we must use py
(prove that we locked the GIL) to use the methods. This code could work in C++, but it won't in Rust. To see why, let's look at the extract
function:
The value produced by the extract
function must live at most as long as 'py
. This is ensured by the constraint 'py: 'a
meaning 'py
(the lifetime of the GIL handle) must be longer than 'a
(the lifetime of our extraced value).
Since the trait function get_image
returns the &Image
, it will break this constraint.
Higher order functions
Higher order functions can be described simply as "functions that produce or take functions as parameters". The first example of such function you may encounter is the map
function. In most languages, it's a function that:
- takes a function of type
A -> B
(takes a value of typeA
and spits out a value of typeB
), - takes a list of values of type
A
- produces a list of values of type
B
(by applying the provided function to the input list's elements)
Using higher order functions to constrain lifetimes
We already saw such function in our article: the function with_gil
! It takes a function that takes a GIL handle, and produce some value using it. This is a clever way of constraining the GIL handle's lifetime. with_gil
could be implemented like this pseudocode:
The handle will only live inside with_gil
, and f
can then safely assume the GIL lock is acquired while it's executing. We can also see why result
cannot be bound to handle
, because it lives longer.
We could do the same with our trait, instead of returning a &Image
, take a function that produces a value given a &Image
, and return the produced value !
Now we can implement ImageProvider
with Python objects !
This way image
never leaves with_gil
, and we can still use it however we want.
Still want to get an owned Image ?
We can even improve our trait with a provided function get_image
:
And now we have the best of both worlds, we can use an image without cloning it, and also clone it when needed !