High performance Python

Writing code in python is easy: because it is dynamically typed, we don’t have to worry to much about declaring variable types (e.g. integers vs. floating point numbers). Also, it is interpreted, rather than compiled. Taken together, this means that we can avoid a lot of the boiler-plate that makes compiled, statically typed languages hard to read. However, this incurs a major drawback: performance for some operations can be quite slow.

Whenever possible, the numpy array representation is helpful in saving time. But not all operations can be vectorized.

Here, we introduce three new technologies that can substantially speed up computations in Python: Cython, Numba and Dask.

Cython takes code that is written in python, and, provided some additional amount of (mostly type) information, compiles it to C, then compiles the C code, and bundle the C objects into python extensions that can then be imported directly into python. Materials on Cython are based partially on a tutorial given by Kurt Smith (Enthought) at the Scipy conference, 2013.

Numba takes a different approach, using just-in-time compilation. Numba is surprisingly simple to use, and the benefits can be staggering.

Dask parallelizes computations across different threads, the different cores that are available to us on the machine we are using, or across a cluster.

The template we use for these lesson websites is based on the lesson template used in Data Carpentry and Software Carpentry workshops,

Schedule

09:00 Introduction Why is Python slow?
Why is Python fast?
What are some ways you can speed Python up even more?
09:15 Introduction to Cython Why use Cython?
How do you install Cython?
What are some ways you can use Cython?
09:30 Compiling Cython code How do we compile Cython code in a typical project?
How do we create Cython objects with no Python overhead
09:40 Using annotations to improve performance even more How do we diagnose performance bottlenecks?
How can we improve these bottlenecks even more?
09:55 Numba What other options do we have to speed up Python code?
10:10 Dask Can we calculate things in parallel?
10:25 Wrap-Up What have we learned?
10:27 Finish