Tuesday, December 2, 2008

Creating NumPy arrays in Pyrex

In a previous post I have described basic usage of the Pyrex language which may be used to interface C code into Python or to speed up your Python code by adding C-like declarations. At the time I did not knew how to use numpy C-API to create new numpy arrays. I have mentioned it in the post and luckily for me a very helpful comment by diffusing thoughts showed me how to do this. It also got me started so that I could decipher how to use other C-API calls.
I currently do not have any production code in which the array creation is a bottleneck. Nevertheless I was curious if these C-API calls are actually faster than using the Python calls to NumPy. Here I present a howto describing what I learned together with some benchmarking results. I'm certain that I will forget how to use these C-API calls and hopefully this post will one day save me some time.

Prerequisites
  • numpy and pyrex have to be installed
  • c_numpy.pxd file from the numpy installation must be accessible (I just put it into my current working directory)
I use the following header of the pyrex (*.pyx) file:
import numpy
cimport c_numpy
cdef extern from "C:\Python25\Lib\site-packages\numpy\core\include\numpy\arrayobject.h":
cdef object PyArray_SimpleNewFromData(int nd,
c_numpy.npy_intp *dims,
int typenum,
void *data)
cdef object PyArray_ZEROS(int nd,
c_numpy.npy_intp *dims,
int typenum,
int fortran)
cdef object PyArray_SimpleNew(int nd,
c_numpy.npy_intp *dims,
int typenum)
cdef object PyArray_Arange(double start,
double stop,
double step,
int typenum)

c_numpy.import_array()

The documentation for C-API of numpy is available for download on the numpy homepage in the form of "Guide to NumPy" pdf file (numpybook).

numpy.zeros vs. PyArray_ZEROS

example:
cdef int length
cdef c_numpy.ndarray newarr
length = 10
newarr = PyArray_ZEROS(1, &length, c_numpy.NPY_DOUBLE, 0)
notes:
  • for multidimensional arrays the first two variables within PyArray_ZEROS have to be changed accordingly, see numpybook (I have not tested this)
  • the type may also be changed if desirable (I however only need doubles)

equivalent numpy code:
newarr = numpy.zeros(length)
Benchmarking these two ways shows that they have the same speed for creating arrays larger than ~ 100 000 values. The C-API is faster on arrays smaller than ~ 50 000 values and about 50% faster on arrays of length 1 000.

numpy.arange vs. PyArray_Arange

example:
cdef double start, stop, step
cdef c_numpy.ndarray newarr
start = 0
stop = 10
step = 1
newarr = PyArray_Arange(start, stop, step, c_numpy.NPY_DOUBLE)
equivalent numpy code:
newarr = numpy.arange(start, stop, step)
Here the C-API is only faster on small arrays (length less than 1 000).

numpy.empty vs. PyArray_SimpleNew

example:
cdef int length
cdef c_numpy.ndarray newarr
length = 10
newarr = PyArray_SimpleNew(1, &length, c_numpy.NPY_DOUBLE)
equivalent numpy code:
newarr = numpy.empty(length)
This is the only case where using C-API is always faster than the numpy way. PyArray_SimpleNew is about 65% faster on arrays of length less than 50 000. It is ~20% faster on arrays of length 500 000. It is still somewhat faster in creating arrays of length 50 000 000.

PyArray_SimpleNewFromData
This call creates a new numpy array from malloc-ed C array.

example:
cdef extern from "stdlib.h":
ctypedef int size_t
void *malloc(size_t)

cdef c_numpy.npy_intp size
cdef c_numpy.ndarray newarr
cdef double *arrsource

size = 10
arrsource = <double *>malloc(sizeof(double) * size)
newarr = PyArray_SimpleNewFromData(1, &size, c_numpy.NPY_DOUBLE, <void *>arrsource)
newarr.flags = newarr.flags|(c_numpy.NPY_OWNDATA) # sets the ownership bit

Notes
  • I have seen some discussion posts which discourage from doing this. See e.g. here.
Technical notes
The benchmarking was done by repeated allocation of a lot of arrays from within Pyrex code. The length of the arrays was also varied to assess whether there is some dependency on the amount of memory being allocated. The largest arrays tested were the largest which didn't caused MemoryError in Python. The number of allocations was selected to assure that the benchmark run at least several seconds (generally thousands of calls or more). I have also tested that it is safe to read and write values into the created arrays.
The benchmarking was done on a PC with Windows XP, Python 2.5.2, Pyrex 0.9.8.2, MinGW 5.1.4 and NumPy 1.2.0.