The Python-Rust Connection

The Ultimate Python Speedup

Jun 22, 2023

I was uninterested in learning Rust for the longest time. Although my MS is in Computer Engineering and I spent several early career years programming in assembly language, my belief is that we are trying to claw our way up from the low levels of programming. I’m an adult now and the idea of using a language without a garbage collector is … well, why would you?

During the 25 years that I’ve used Python, I have occasionally ventured into the realm of performance. I’ve watched numerous projects attempt to make it easier to write compiled extensions, and none have ever looked simple enough for me to try.

Then my friend Jeremy had a performance problem, profiled it down to a little bit of Python, and replaced that Python with a Rust extension using the Py03 system. He’s good at wrestling with things so when he said it was pretty easy I didn’t quite believe him. And besides, I would have to learn Rust with its weird lifetime stuff.

During the research for my Rethinking Objects presentation for Pycon 2023, I started looking at the way Rust worked with objects, and was very impressed: the Rust designers had learned from decades of language design experiments and had produced something excessively elegant. And they weren’t shackled by either backwards compatibility or the limitations of an existing virtual machine, so they didn’t have to make compromises.

As I got further sucked into learning Rust, I kept coming across features that they just did right, and this made me realize that every other language I’ve learned has had places where I said “well, they had to make a compromise here, so I guess it’s OK.” There’s something deeply satisfying about discovering features that just feel right. As the field continues to evolve I’m sure we’ll discover new and better concepts, but Rust seems to have anticipated that as well — the language tries to be as unbiased as possible to allow for this. For example, the concurrency that’s built into Rust is only what’s essential to make it work (including on bare hardware with no OS!), which allows third-party packages (called crates in Rust) to implement all manner of different concurrency strategies.

Rust’s fundamental design principles are safety and performance. The “safety” part is reflected in the most rigorous and complete type system that I have experienced. As I have learned about Rust’s lifetimes, I have come to view that as part of the type system as well. I say this because lifetimes appear to enable abilities in Rust that greatly simplify other aspects of the language. For example, lifetimes appear to be at least partly responsible for the fact that Rust has no variance in its generics (covariance, contravariance), thus dramatically simplifying the creation and use of generics.

The “safety” maxim is also responsible for the significant use of macros. These are real syntactic macros in which you can do just about anything, the difference being that they seem much more understandable than those I’ve heard about in other languages (notably Lisp). Rust macros are roughly like pattern matching (which Rust also has, full pattern matching as in Scala), and one of the reasons they are used so much is that they provide variable argument lists without any loss of compile-time type checking.

The “performance” maxim is one of the reasons for lifetimes. Because the compiler knows exactly who owns each value and thus how long that value should exist — and because it enforces all this at compile-time — Rust can eliminate memory leaks without a garbage collector.

The fact that Rust produces compiled executables (using LLVM), it can do all kinds of things that a language with a runtime cannot. For example, it was one of the first languages to compile to WASM, and has a mature ecosystem of support libraries (check out Yew, for example). And of course, distribution and installation becomes much easier for end-users who don’t have to jump through the (big) hoops of installing Java or Docker, or the (easier) process of installing Python.

I could go on, but I’m basically hooked and spent several months going through the (unsurprisingly well-done) Rust docs, books and multitudinous YouTube videos to the point where I feel comfortable reading Rust code and even starting to write it.

I still love Python, but see the two going hand-in-hand. Python is the king of rapid iteration, but once you get the design down and discover things aren’t running fast enough, enter Rust via Py03. In particular, when you have all your Python code type-annotated, you can take your bottleneck function and ask ChatGPT to translate it to Rust — if you specify Py03, it walks you through the details of setting it up. It won’t necessarily be perfect the first time, but one of the things you’ll notice is that the Rust version of the function looks like the Python function, so it’s a lot easier to verify the translation and catch any places the LLM might have slightly hallucinated.

I’ve done a set of experiments that you can find on my Github repo. This started as an exploration of Python’s ProcessPoolExecutor by creating a function that saturates the core it is running on (called cpu_intensive()), and duplicating that function across all available cores. Then I wondered if this would be a candidate for a Rust extension, which showed dramatic speedup. After that, I wrote the entire application in Rust, using Rust native threads, to see just how fast it could go. Here are the results on my new desktop machine:

No Concurrency: 57.44s
Python Concurrency with ProcessPoolExecutor: 5.52s
Python with cpu_intensive() as a Rust extension: 0.31s
Completely Written in Rust: 0.11s

I also tried using Cython, which appears to be the friendliest of the Python compiled-extension strategies. I could get it to compile and run, but it produced incorrect results. I tried several approaches (which you can find here) but couldn’t crack the problem, and Cython certainly didn’t have the kind of helpful error messages that guided me through the Rust process. Instead it reminded me of using old C, where you are kind of on your own. The timing for the incorrect output was 2.58s, suggesting that a working Cython extension would still fall significantly short of Rust performance. At this point I can’t see a benefit to using anything other than Rust for a Python extension.

One of the things you might have noticed recently is the trend of writing Python tools in Rust. The code-checker Ruff, for example, made a big splash because of its remarkable speed. If you start looking, there are all kinds of crossover tools between Python and Rust — for example, PyOxidizer packages a Python interpreter inside a Rust application so you can distribute a native executable of your Python program without requiring the end-user to install Python. There’s even a version of Python that is written in Rust, rather than C. The new language Mojo looks like someone took Python and Rust and mashed them together (at this writing that language is still in early days and cannot yet work with a significant number of Python language constructs).

My new view is: “Python for rapid development, Rust extensions for performance.” But I’m also intrigued by the idea of taking a Python program after working out the design and converting it to Rust, or distributing an executable using PyOxidizer. Rust breaks Python into areas where it was previously constrained by performance or inconvenience.

Clarifying Concurrency

The Python-Rust Connection

The Ultimate Python Speedup