Mailman 3 Re: Cython / c++ help - scikit-image

28 May 2013

      Hi Stéfan, Johannes,

Thanks for having a read through this...

I've tried to explain the reason why I shouldn't perform the 
prefix/cumulative sum on the CPU using numpy etc  in this diagram...

<https://lh5.googleusercontent.com/-AhB5JfV8qQA/UaSk96a9jDI/AAAAAAAAAEY/MKD8iYCFyf8/s1600/scikit.jpg>

In method 1 I have to ship the entire flag array, build the compacted array 
and ship it back. I've also tested cumsum with timeit and I'm certain the 
CPU prefix-sum algorithms are simply too slow..
...
...
timeit.timeit('np.cumsum(x)', setup='import numpy as np; x = 
np.random.random_integers(0, 10, 1850)')
...
...
8.654011964797974
Method 2 just requires that the length of the queue be shipped across to 
the CPU to know have many threads to execture the GPU method 
gpu_process_tiles() with.

On Monday, May 27, 2013 11:54:50 PM UTC+2, Stefan van der Walt wrote:
...
On Mon, May 27, 2013 at 10:55 PM, Marc de Klerk <dekl...@gmail.com<javascript:>
...
wrote:
...
This operation has to happen a lot… so I really need it to be fast. The 
problem I'm having is that the when I isolate and measure the execution 
time of the gpu code it's much faster than that of the c++ or Cython 
wrapper - which I cannot really do without.
Another option is also to call into the NumPy C API to evoke essentially 
the equivalent of
np.nonzero(np.diff(np.cumsum(x)))[0] + 1
Stéfan

Re: Cython / c++ help

Marc de Klerk

tags

participants (1)