I currently do not have any production code in which the array creation is a bottleneck. Nevertheless I was curious if these C-API calls are actually faster than using the Python calls to NumPy. Here I present a howto describing what I learned together with some benchmarking results. I'm certain that I will forget how to use these C-API calls and hopefully this post will one day save me some time.
Prerequisites
- numpy and pyrex have to be installed
- c_numpy.pxd file from the numpy installation must be accessible (I just put it into my current working directory)
import numpy
cimport c_numpy
cdef extern from "C:\Python25\Lib\site-packages\numpy\core\include\numpy\arrayobject.h":
cdef object PyArray_SimpleNewFromData(int nd,
c_numpy.npy_intp *dims,
int typenum,
void *data)
cdef object PyArray_ZEROS(int nd,
c_numpy.npy_intp *dims,
int typenum,
int fortran)
cdef object PyArray_SimpleNew(int nd,
c_numpy.npy_intp *dims,
int typenum)
cdef object PyArray_Arange(double start,
double stop,
double step,
int typenum)
c_numpy.import_array()
The documentation for C-API of numpy is available for download on the numpy homepage in the form of "Guide to NumPy" pdf file (numpybook).
numpy.zeros vs. PyArray_ZEROS
example:
cdef int lengthnotes:
cdef c_numpy.ndarray newarr
length = 10
newarr = PyArray_ZEROS(1, &length, c_numpy.NPY_DOUBLE, 0)
- for multidimensional arrays the first two variables within PyArray_ZEROS have to be changed accordingly, see numpybook (I have not tested this)
- the type may also be changed if desirable (I however only need doubles)
equivalent numpy code:
newarr = numpy.zeros(length)Benchmarking these two ways shows that they have the same speed for creating arrays larger than ~ 100 000 values. The C-API is faster on arrays smaller than ~ 50 000 values and about 50% faster on arrays of length 1 000.
numpy.arange vs. PyArray_Arange
example:
cdef double start, stop, stepequivalent numpy code:
cdef c_numpy.ndarray newarr
start = 0
stop = 10
step = 1
newarr = PyArray_Arange(start, stop, step, c_numpy.NPY_DOUBLE)
newarr = numpy.arange(start, stop, step)Here the C-API is only faster on small arrays (length less than 1 000).
numpy.empty vs. PyArray_SimpleNew
example:
cdef int lengthequivalent numpy code:
cdef c_numpy.ndarray newarr
length = 10
newarr = PyArray_SimpleNew(1, &length, c_numpy.NPY_DOUBLE)
newarr = numpy.empty(length)This is the only case where using C-API is always faster than the numpy way. PyArray_SimpleNew is about 65% faster on arrays of length less than 50 000. It is ~20% faster on arrays of length 500 000. It is still somewhat faster in creating arrays of length 50 000 000.
PyArray_SimpleNewFromData
This call creates a new numpy array from malloc-ed C array.
example:
cdef extern from "stdlib.h":
ctypedef int size_t
void *malloc(size_t)
cdef c_numpy.npy_intp size
cdef c_numpy.ndarray newarr
cdef double *arrsource
size = 10
arrsource = <double *>malloc(sizeof(double) * size)
newarr = PyArray_SimpleNewFromData(1, &size, c_numpy.NPY_DOUBLE, <void *>arrsource)
newarr.flags = newarr.flags|(c_numpy.NPY_OWNDATA) # sets the ownership bit
Notes
- I have seen some discussion posts which discourage from doing this. See e.g. here.
- It may also be helpful to read this post by Travis Oliphant
The benchmarking was done by repeated allocation of a lot of arrays from within Pyrex code. The length of the arrays was also varied to assess whether there is some dependency on the amount of memory being allocated. The largest arrays tested were the largest which didn't caused MemoryError in Python. The number of allocations was selected to assure that the benchmark run at least several seconds (generally thousands of calls or more). I have also tested that it is safe to read and write values into the created arrays.
The benchmarking was done on a PC with Windows XP, Python 2.5.2, Pyrex 0.9.8.2, MinGW 5.1.4 and NumPy 1.2.0.