Teaching: 10 min Exercises: 5 minQuestions
Why use Cython?
How do you install Cython?
What are some ways you can use Cython?
Install Cython on your own laptop using conda
Write functions that can be cythonized in the notebook
Profile functions with ipython magic functions, and measure speedup due to cythonization
Cython is a technology that allows us to easily bridge between python, and the underlying C representations. The main purpose of the library is to take code that is written in python, and, provided some additional amount of (mostly type) information, compile it to C, compile the C code, and bundle the C objects into python extensions that can then be imported directly into python.
You can install Cython from the command line using conda
conda install cython
To demonstrate the usefulness of Cython, we’ll start with an atypical usage
pattern: In the Jupyter
notebook, we will use the cython
extension, to
demonstrate why and how to use cython.
Later, we will also look at how to use cython in the context of modules and
libraries. But for now, let’s load the cython extension. This allows us to
mark cells as Cython cells by starting them with %%cython
%load_ext cython
Let’s see what this is good for. Consider a very simple function in Python:
def my_poly(a,b):
return 10.5 * a + 3 * (b**2)
The equivalent Cython function is defined in a %%cython
def my_polyx(double a, double b):
return 10.5 * a + 3 * (b**2)
What are the differences?
Note that the only difference is that we tell the function to treat these variables as double-precision numbers. Why is that important? Cython is a dialect of Python: If this code were written in a regular Python cell it would produce a syntax error. Cython is a ‘dialect’ of python, but it is not exactly like Python. In fact, Cython is a proper superset of python. That means that any python code is syntactical Cython code, but not the opposite.
To time the performance of Python/Cython code, we can use the IPython
%timeit my_poly(10, 2)
%timeit my_polyx(10, 2)
For even a trivial piece of code, we can already gain an approximately 3-fold speedup
Let’s consider an (only slightly) more interesting example, the calculation of the Fibonacci series.
The Fibonacci series
The Fibonacci series are arranged according to the rule: F[n] = F[n-1] + F[n-2]
This series has many interesting properties, but for our purposes it has one particulary interesting property and that is the fact that the item in the
th location cannot be calculated in a vectorized fashion (without first calculating items inn-1
and so on untiln-1 = 0
). This means that we expect a naive computation to be rather slow.
def fib(n):
a, b = 1, 1
for i in range(n):
a, b = a + b, a
return a
For the Cython version of the function, we will use the cdef
keyword (a
Cython language constant) to define local variables (integers used only within the function):
def fibx(int n):
cdef int i, a, b
a, b = 1, 1
for i in range(n):
a, b = a + b, a
return a
Compare the two using %timeit
%timeit fib(10)
%timeit fibx(10)
In this case, we are already in the realm of a 10X speedup!
Let’s pause to consider the implications of this. The C code required to perform the same calculation as fibx might look something like this:
int fib(int n){
int tmp, i, a, b;
a = b = 1;
for(i=0; i<n; i++){
tmp = a;
a += b;
b = tmp;}
return a;}
In and of itself, that’s not too terrible, but can get unpleasant if you write more than this trivial function. The main issue is that integrating this code into a python program is not trivial and requires writing extension code (think mex, if you’ve used these in Matlab). This also has overhead that is hard to optimize. Cython writes highly optimized python extension code, making it easy to separate out performance bottle-necks and compile them, but keep using the functions in your Python code.
Speeding up recursion
Recursive functions are functions that call themselves during their execution. Another interesting property of the Fibonacci series is that it can be written as a recursive function. That’s because each item depends on the values of other items (namely item n-1 and item n-2)
Rewrite the
function using recursion. Is it faster than the non-recursive version? Does Cythonizing it give even more of an advantage?
Speeding up recursion
Here is a version of the Fibonacci series written using recursion:
def fib_r(n): if n <= 1: return n else: return fib_r(n-1) + fib_r(n-2)
Is it better? Well, it turns out that recursion looks clever, but works much worse (why is that?). Even worse for this case, Cythonizing the recursed version of Fibonacci doesn’t do much for us either. Why do you think that is? Later, we’ll see how we can diagnose these situations.
One of the major challenges in using Cython is that it requires compiling the code for all the platforms (and architectures) on which you want to run the code. This often means that you will distribute the Cython source code and ask users to compile it themselves. If this fails, however, you might still want the code to do what it’s supposed to do, albeit slower.
The following is a perfectly syntactical Python example, that can also be compiled using Cython. The declarations are now done as calls to functions in the Cython library, instead of. If all else fails, this could would still work.
import cython
def fib_pure_python(n):
a, b = 1, 1
for i in range(n):
a, b = a + b, a
return a
Try running this code with the %%cython
magic removed, and witness the slow
down back to Python speed.
Key Points
Cython can speed up some computations dramatically