Thursday, February 25, 2010

Transforming RGB data to wavelength information

Here, I'm presenting a reverse transformation to wavelength to RGB. Right now I have no means of testing its accuracy on some real data. As a first check, I've run it on the mercury lamp image shown in my older post. There are small deviations caused probably by low intensity of light in some places. If used on some real stuff, I would definitely attempt to first fit a line (or some other meaningful function) through the estimated wavelengths. Then of course plot the spectra against fitted values. The domain of this function is limited to 380 - 645 nm. So, here's the algorithm in Python:

def RGB2lambda(R, G, B):
"""Returns 0 if indeciferable"""
# selects range by maximum component
# if max is blue - range is 380 - 489
# if max is green - range is 490 - 579
# if max is red - range is 580 - 645

# which colour has highest intensity?
high = float(R)
highind = 1
if G > high:
high = float(G)
highind = 2
if B > high:
high = float(B)
highind = 3

# normalize highest to 1.0
RGBnorm = [R / high, G / high, B / high]

# start decoding
RGBlambda = 0
if highind == 1: # red is highest
if B >= G: # there is more blue than green
return 0 # max red and more blue than green shouldn't happen
# wavelength linearly changes from 645 to 580 as green increases
RGBlambda = 645 - RGBnorm[1] * (645. - 580.)

elif highind == 2: # green is max, range is 510 - 579
if R > B: # range is 510 - 579
RGBlambda = 510 + RGBnorm[0] * (580 - 510)
else: # blue is higher than red, range is 490 - 510
RGBlambda = 510 - RGBnorm[2] * (510 - 490)

elif highind == 3: # blue is max, range is 380 - 490
if G > R: # range is 440 - 490
RGBlambda = RGBnorm[1] * (490 - 440) + 440
else: # there is more red than green, range is 380 - 440
RGBlambda = 440 - RGBnorm[0] * (440 - 380)

return RGBlambda


And here is an accuracy check made on my older image of mercury lamp spectrum:
Accuracy check (synthetic test): x-axis is expected value, y-axis is algorithm output. Circles are the data, the ideal is presented by red line (y = x).

Please note that this was NOT produced by using a real photo of a spectrum. As someone who has some spectroscopic background I strongly discourage you from using this algorithm to get wavelength information. This was derived from a simplified version of simple wavelength-to-RGB algorithm. Nevertheless, I intend to make some photos of monochromatic light and test it ;-)

Sunday, February 21, 2010

How to make a .wav file with Python, revisited

I've noticed that my two-year old post about making a .wav file with Python is being read a lot so I forced myself to modify the code a little. Now it should be nicer to read and modify further according to your needs/wishes. I've also incorporated the comments of helpful readers Mike Axiak and Fingon. Thanks guys ;-)
The link to the source of this idea, code by Andrea Valle, does not work as I'm typing this so I can only thank him by saying his name. My contribution to all this is minor.

import numpy as N
import wave

def get_signal_data(frequency=440, duration=1, volume=32768, samplerate=44100):
"""Outputs a numpy array of intensities"""
samples = duration * samplerate
period = samplerate / float(frequency)
omega = N.pi * 2 / period
t = N.arange(samples, dtype=N.float)
y = volume * N.sin(t * omega)
return y

def numpy2string(y):
"""Expects a numpy vector of numbers, outputs a string"""
signal = "".join((wave.struct.pack('h', item) for item in y))
# this formats data for wave library, 'h' means data are formatted
# as short ints
return signal

class SoundFile:
def __init__(self, signal, filename, duration=1, samplerate=44100):
self.file = wave.open(filename, 'wb')
self.signal = signal
self.sr = samplerate
self.duration = duration

def write(self):
self.file.setparams((1, 2, self.sr, self.sr*self.duration, 'NONE', 'noncompressed'))
# setparams takes a tuple of:
# nchannels, sampwidth, framerate, nframes, comptype, compname
self.file.writeframes(self.signal)
self.file.close()

if __name__ == '__main__':
duration = 2
myfilename = 'test.wav'
data = get_signal_data(440, duration)
signal = numpy2string(data)
f = SoundFile(signal, myfilename, duration)
f.write()
print 'file written'

Friday, May 22, 2009

Conversion of wavelength in nanometers to RGB in Python

I've been recently doing some calibration/benchmark/demonstration measurements of spectra on an instrument I'm building and thought that it would be cool to visualize the spectra so that you could actually see them in colour. I've also never been able to memorize what colour which wavelength was, so this could even be a self-education project. In other words, the task was to transform wavelength data in nanometers into RGB data.

A short Google search turned up some very informative sites and among them was the algorithm for nanometer to RGB conversion. What seems to be the oldest search result is a conversion algorithm written by Dan Bruton in FORTRAN. You may also be interested in the Color Science site from the same author. As I was a bit confused by the FORTRAN code, I also used what appears to be a translation of this code into C#. I know C# about as much as FORTRAN but the syntax was more understandable to me. My only contribution was a literal translation of the algorithm into Python.

The function takes a value in nanometers and returns a list of [R, G, B] values. Although a PIL putpixel function requires a tuple, I found a list more flexible in case you want to change the values e.g. according to measured intensity. So, here is the code:

def wav2RGB(wavelength):
w = int(wavelength)

# colour
if w >= 380 and w < 440:
R = -(w - 440.) / (440. - 350.)
G = 0.0
B = 1.0
elif w >= 440 and w < 490:
R = 0.0
G = (w - 440.) / (490. - 440.)
B = 1.0
elif w >= 490 and w < 510:
R = 0.0
G = 1.0
B = -(w - 510.) / (510. - 490.)
elif w >= 510 and w < 580:
R = (w - 510.) / (580. - 510.)
G = 1.0
B = 0.0
elif w >= 580 and w < 645:
R = 1.0
G = -(w - 645.) / (645. - 580.)
B = 0.0
elif w >= 645 and w <= 780:
R = 1.0
G = 0.0
B = 0.0
else:
R = 0.0
G = 0.0
B = 0.0

# intensity correction
if w >= 380 and w < 420:
SSS = 0.3 + 0.7*(w - 350) / (420 - 350)
elif w >= 420 and w <= 700:
SSS = 1.0
elif w > 700 and w <= 780:
SSS = 0.3 + 0.7*(780 - w) / (780 - 700)
else:
SSS = 0.0
SSS *= 255

return [int(SSS*R), int(SSS*G), int(SSS*B)]

The output value's range is 0 -- 255. The code could use some streamlining, but even in this form it is fast enough for an occasional image.

Here is whole visible spectrum as made by this function:

... and a line spectrum of our decades-old mercury vapour lamp:

Finally, in case you want to read more about computer colour science:
Rendering spectra
Colour Rendering of Spectra

Tuesday, December 2, 2008

Creating NumPy arrays in Pyrex

In a previous post I have described basic usage of the Pyrex language which may be used to interface C code into Python or to speed up your Python code by adding C-like declarations. At the time I did not knew how to use numpy C-API to create new numpy arrays. I have mentioned it in the post and luckily for me a very helpful comment by diffusing thoughts showed me how to do this. It also got me started so that I could decipher how to use other C-API calls.
I currently do not have any production code in which the array creation is a bottleneck. Nevertheless I was curious if these C-API calls are actually faster than using the Python calls to NumPy. Here I present a howto describing what I learned together with some benchmarking results. I'm certain that I will forget how to use these C-API calls and hopefully this post will one day save me some time.

Prerequisites
  • numpy and pyrex have to be installed
  • c_numpy.pxd file from the numpy installation must be accessible (I just put it into my current working directory)
I use the following header of the pyrex (*.pyx) file:
import numpy
cimport c_numpy
cdef extern from "C:\Python25\Lib\site-packages\numpy\core\include\numpy\arrayobject.h":
cdef object PyArray_SimpleNewFromData(int nd,
c_numpy.npy_intp *dims,
int typenum,
void *data)
cdef object PyArray_ZEROS(int nd,
c_numpy.npy_intp *dims,
int typenum,
int fortran)
cdef object PyArray_SimpleNew(int nd,
c_numpy.npy_intp *dims,
int typenum)
cdef object PyArray_Arange(double start,
double stop,
double step,
int typenum)

c_numpy.import_array()

The documentation for C-API of numpy is available for download on the numpy homepage in the form of "Guide to NumPy" pdf file (numpybook).

numpy.zeros vs. PyArray_ZEROS

example:
cdef int length
cdef c_numpy.ndarray newarr
length = 10
newarr = PyArray_ZEROS(1, &length, c_numpy.NPY_DOUBLE, 0)
notes:
  • for multidimensional arrays the first two variables within PyArray_ZEROS have to be changed accordingly, see numpybook (I have not tested this)
  • the type may also be changed if desirable (I however only need doubles)

equivalent numpy code:
newarr = numpy.zeros(length)
Benchmarking these two ways shows that they have the same speed for creating arrays larger than ~ 100 000 values. The C-API is faster on arrays smaller than ~ 50 000 values and about 50% faster on arrays of length 1 000.

numpy.arange vs. PyArray_Arange

example:
cdef double start, stop, step
cdef c_numpy.ndarray newarr
start = 0
stop = 10
step = 1
newarr = PyArray_Arange(start, stop, step, c_numpy.NPY_DOUBLE)
equivalent numpy code:
newarr = numpy.arange(start, stop, step)
Here the C-API is only faster on small arrays (length less than 1 000).

numpy.empty vs. PyArray_SimpleNew

example:
cdef int length
cdef c_numpy.ndarray newarr
length = 10
newarr = PyArray_SimpleNew(1, &length, c_numpy.NPY_DOUBLE)
equivalent numpy code:
newarr = numpy.empty(length)
This is the only case where using C-API is always faster than the numpy way. PyArray_SimpleNew is about 65% faster on arrays of length less than 50 000. It is ~20% faster on arrays of length 500 000. It is still somewhat faster in creating arrays of length 50 000 000.

PyArray_SimpleNewFromData
This call creates a new numpy array from malloc-ed C array.

example:
cdef extern from "stdlib.h":
ctypedef int size_t
void *malloc(size_t)

cdef c_numpy.npy_intp size
cdef c_numpy.ndarray newarr
cdef double *arrsource

size = 10
arrsource = <double *>malloc(sizeof(double) * size)
newarr = PyArray_SimpleNewFromData(1, &size, c_numpy.NPY_DOUBLE, <void *>arrsource)
newarr.flags = newarr.flags|(c_numpy.NPY_OWNDATA) # sets the ownership bit

Notes
  • I have seen some discussion posts which discourage from doing this. See e.g. here.
Technical notes
The benchmarking was done by repeated allocation of a lot of arrays from within Pyrex code. The length of the arrays was also varied to assess whether there is some dependency on the amount of memory being allocated. The largest arrays tested were the largest which didn't caused MemoryError in Python. The number of allocations was selected to assure that the benchmark run at least several seconds (generally thousands of calls or more). I have also tested that it is safe to read and write values into the created arrays.
The benchmarking was done on a PC with Windows XP, Python 2.5.2, Pyrex 0.9.8.2, MinGW 5.1.4 and NumPy 1.2.0.

Monday, November 24, 2008

Pyrex for dummies

Pyrex is a very powerful tool if you need to speed up some Python code or connect it with some C code. My main motivation to use it was the ability to distribute only the compiled files and not also a C compiler etc.

Unfortunately, the learning curve is a bit steep if you want to do anything non-trivial (if you are not a skilled programmer, that is). I find it very difficult to read and understand just about anything about Python C API and related topics. So any attempt to learn new tricks means spending hours with Google and trying to find an example of usage which is simple enough for me to understand.

What follows is a few steps which I generally follow when rewriting Python into Pyrex code.
  • Profile your code. Start with rewriting the slowest parts of your code (unless you want to know how fast you can make everything).

  • Try to compile unchanged Python code with Pyrex. This way you will start with a working code and if something goes wrong you probably broke it with later changes.

  • Declare loop control variables as C types:
    cdef int i
    and change Python for loops into Pyrex for loops
    for i from 0 <= i < 10:

  • Numpy array data should be accessed directly. It's fairly easy to do now, that I finally know how to do this (see further).

  • Additional speedup may be achieved by eliminating or minimising Python function calls and/or replacing them with C function calls.

  • To help understanding your functions, add at least one line to the docstring such as:
    """myfunc( N, array x) -> array y"""
    Pyrex (or C) function parameters are not accessible as in Python functions by using help(myfunc) therefore you must explicitly write it to the docstring.

These steps have so far done the trick for my purposes. I essentially only need fast maths code so I have no idea about other areas. What I may have to learn later is some string stuff but so far I had neither guts nor reason to try it. This means I'm using pure Python strings in my Pyrex modules.

Now some selected details as promised:
To use some standard C function you have to declare it before use. So, if I e.g. need asin() from the math.h library I put this at the beginning of the Pyrex module:
cdef extern from "math.h":
double asin(double)

Using Numpy C API to access array data directly was tough to learn, this is my current way:
  • put to the beginning of the script:
    cimport c_numpy
    for this to work you have to copy 'c_numpy.pxd' file from 'Python\Lib\site-packages\numpy\doc\pyrex' into the directory with your script (there is a warning about future removal, I hope the same-named file in '\doc\cython' will work as well).

  • initialize numpy by :
    c_numpy.import_array()

  • declare new numpy array:
    cdef c_numpy.ndarray x

  • create numpy array:
    x = numpy.zeros(10)
    There is another (possibly faster) way of creating new arrays but this is what i use now (I also do not know the other way, should have made a note...).

  • declare pointer to store the adress of the numpy array data:
    cdef double *xdata

  • copy data adress to your pointer:
    xdata = <double *>x.data
    The x.data is a char poiter to the first number in your array. I have no idea what this means (why char?).

  • you may now index xdata to get desired element value:
    xdata[6] = 12.54
    or
    tempvar = xdata[1]

  • you may declare the numpy array in the same way during function declaration:
    def myfunc(double step, c_numpy.ndarray x):
    ...

Using the Numpy C API is more cumbersome than just indexing numpy arrays but the code speedup is often significant.
Looking through my Pyrex modules this should in essence be all that is needed to get started with Pyrex.

Friday, November 21, 2008

Pyrex - mixing C and Python code

I have had a little optimization mania recently. After realizing that scipy.weave is ill suited for deployment on the computers of my non-programming colleagues I was looking for other options and tried the Pyrex extension. I recommend reading the Pyrex home page for details of what exactly Pyrex is. A title sentence of its home page explains it nicely:
Pyrex lets you write code that mixes Python and C data types any way you want, and compiles it into a C extension for Python.
This has the (for me) significant advantage that there is no need to install scipy and gcc on any computer that you want your code to run on (as compared to the scipy.weave model). Distributing the compiled extension file is enough.
To make Pyrex work on Windows follow these instructions (here seems to be a copy). As usual you have to change something a bit to make it work with gcc. The key step of these instructions is creation (or edit) of distutils.cfg and adding these lines into it:
[build]
compiler = mingw32
Aside from this I also use a modified python setup command:
python setup.py build-ext --inplace
It suits me better to have the compiled extension in the working directory and not in Python dir as happens with the 'install' command.
I am aware of the Cython language which has developed from Pyrex but I have not tried it. I guess I don't need to fix what ain't broken ;-) Pyrex works great for me right now.

Tuesday, August 26, 2008

Speeding up Python code with Psyco

Possibly the easiest way of speeding up Python code is to use Psyco. Psyco is extremely easy to use and it is at least worth a try before delving into C/C++ mess either directly or with the help of scipy.weave.
The usage is simple, just download Psyco, install it and add following to your code:
import psyco
psyco.full()
The full() directive tells Psyco to optimize everything. I usually put these two lines to the beginning of the code, just after other imports. Psyco website states that possible speedups can be between 2x-100x. I have seen every number between 10% faster to 50x faster code execution.
The following is an example of very high acceleration of very simple code:
def test(N, M):
res = []
for i in range(N):
for j in range(M):
if j == 0:
x = i*i*i
res.append(x)
return res
Running this function with N = 10000 and M = 100000 takes ~60 seconds on Intel E8200 processor. After importing and running Psyco the very same code takes ~1.2 seconds. The speedup is then ~50x.
Psyco works best on functions which run at least couple of seconds per each call and are similar to the presented test function. Psyco only optimizes code within functions and scripts classes but this isn't really a problem.
To only optimize one function use
psyco.bind(yourfunctionname)