a Numeric.where re-coded for weave.inline: very fast...

I tried weave for executing a C routine over a Numeric array. The Numeric data array is c_int16 with 14 significant bits (bit 13 is the sign) and I wanted it as "normal" Int16. So: t1=clock() m = where(greater(dataArray[:1000], 8191), dataArray[:1000]-16383, dataArray[:1000]) t2 = clock() print 'where', round((t2-t1)*1000000, 1), 'us' Usually gives: where 634.3 us PyInline has about the same calling overhead as ctypes (I had tried a DLL and ctypes: http://sourceforge.net/mailarchive/forum.php?thread_id=7630224&forum_id=24606 ), apparently, about 240us for a simple C statement. It also requires some extra hoops to digest the pointer to short I need ( void b2int(short *f, int N) ), so I didn't fully try it. weave.inline testing was very good, with different issues. I found I had to patch msvccompiler.py for MS C 7.1 .NET ( http://www.vrplumber.com/programming/mstoolkit/ ). After the free but massive download/install... import weave inline = weave.inline # for speed N = 1000 code="int i; for(i = 0; i < N; i++){if (dataArray[i]>8191){dataArray[i]-=16383;}}" inline(code, ['dataArray', 'N']) # just pass Python object's names This created sc_019a1cf36209cb2dfc688820080541ef0.pyd in C:\Documents and Settings\rays\Local Settings\Temp\rays\python24_compiled\ Using the above code is slow; ~32000 us, as the compiler checks or runs each time. However, after much fiddling and dir()s, I copied the long-name.pyd to C:\Python24\DLLs and just did import sc_019a1cf36209cb2dfc688820080541ef0 b2iw = sc_019a1cf36209cb2dfc688820080541ef0.compiled_func # note the exposed function! N = 1 t1=clock() b2iw({'dataArray':dataArray}, {'N':N}) # note the dicts! t2 = clock() print 'weave', round((t2-t1)*1000000, 1), 'us' 25us! (~300us for N=10000) I think that's as close to C speed as I can expect, although I'm looking at the compiler options in msvccompiler.py for P4 optimization... I still get "Missing compiler_cxx fix for MSVCCompiler" on the initial compile, but apparently to no harm. As a final note, I also found that psyco _slows down_ both ctypes and weave calls. I did psyco.full() at the app's start. Without pysco: b2i 210.8 us weave 53.7 us with: b2i 250.0 us weave 234.7 us Comment/criticism ? Ray
participants (1)
-
Ray Schumacher