Monday, November 24, 2008

Pyrex for dummies

Pyrex is a very powerful tool if you need to speed up some Python code or connect it with some C code. My main motivation to use it was the ability to distribute only the compiled files and not also a C compiler etc.

Unfortunately, the learning curve is a bit steep if you want to do anything non-trivial (if you are not a skilled programmer, that is). I find it very difficult to read and understand just about anything about Python C API and related topics. So any attempt to learn new tricks means spending hours with Google and trying to find an example of usage which is simple enough for me to understand.

What follows is a few steps which I generally follow when rewriting Python into Pyrex code.
  • Profile your code. Start with rewriting the slowest parts of your code (unless you want to know how fast you can make everything).

  • Try to compile unchanged Python code with Pyrex. This way you will start with a working code and if something goes wrong you probably broke it with later changes.

  • Declare loop control variables as C types:
    cdef int i
    and change Python for loops into Pyrex for loops
    for i from 0 <= i < 10:

  • Numpy array data should be accessed directly. It's fairly easy to do now, that I finally know how to do this (see further).

  • Additional speedup may be achieved by eliminating or minimising Python function calls and/or replacing them with C function calls.

  • To help understanding your functions, add at least one line to the docstring such as:
    """myfunc( N, array x) -> array y"""
    Pyrex (or C) function parameters are not accessible as in Python functions by using help(myfunc) therefore you must explicitly write it to the docstring.

These steps have so far done the trick for my purposes. I essentially only need fast maths code so I have no idea about other areas. What I may have to learn later is some string stuff but so far I had neither guts nor reason to try it. This means I'm using pure Python strings in my Pyrex modules.

Now some selected details as promised:
To use some standard C function you have to declare it before use. So, if I e.g. need asin() from the math.h library I put this at the beginning of the Pyrex module:
cdef extern from "math.h":
double asin(double)

Using Numpy C API to access array data directly was tough to learn, this is my current way:
  • put to the beginning of the script:
    cimport c_numpy
    for this to work you have to copy 'c_numpy.pxd' file from 'Python\Lib\site-packages\numpy\doc\pyrex' into the directory with your script (there is a warning about future removal, I hope the same-named file in '\doc\cython' will work as well).

  • initialize numpy by :
    c_numpy.import_array()

  • declare new numpy array:
    cdef c_numpy.ndarray x

  • create numpy array:
    x = numpy.zeros(10)
    There is another (possibly faster) way of creating new arrays but this is what i use now (I also do not know the other way, should have made a note...).

  • declare pointer to store the adress of the numpy array data:
    cdef double *xdata

  • copy data adress to your pointer:
    xdata = <double *>x.data
    The x.data is a char poiter to the first number in your array. I have no idea what this means (why char?).

  • you may now index xdata to get desired element value:
    xdata[6] = 12.54
    or
    tempvar = xdata[1]

  • you may declare the numpy array in the same way during function declaration:
    def myfunc(double step, c_numpy.ndarray x):
    ...

Using the Numpy C API is more cumbersome than just indexing numpy arrays but the code speedup is often significant.
Looking through my Pyrex modules this should in essence be all that is needed to get started with Pyrex.