[Numpy-discussion] efficient 3d histogram creation

Chris Colbert sccolbert at gmail.com
Wed May 6 20:34:18 EDT 2009


nice!

This was really my first attempt at doing anything constructive with Cython.
It was actually unbelievably easy to work with. I think i spent less time
working on this, than I did trying to find an optimized solution using pure
numpy and python.

Chris

On Wed, May 6, 2009 at 8:21 PM, <josef.pktd at gmail.com> wrote:

> On Wed, May 6, 2009 at 7:39 PM, Chris Colbert <sccolbert at gmail.com> wrote:
> > i just realized I don't need the line:
> >
> > cdef int z = img.shape(2)
> >
> > it's left over from tinkering. sorry. And i should probably convert the
> out
> > array to type float to handle large data sets.
> >
> > Chris
> >
> > On Wed, May 6, 2009 at 7:30 PM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Wed, May 6, 2009 at 6:06 PM, Chris Colbert <sccolbert at gmail.com>
> wrote:
> >> > I decided to hold myself over until being able to take a hard look at
> >> > the
> >> > numpy histogramdd code:
> >> >
> >> > Here is a quick thing a put together in cython. It's a 40x speedup
> over
> >> > histogramdd on Vista 32 using the minGW32 compiler. For a (480, 630,
> 3)
> >> > array, this executed in 0.005 seconds on my machine.
> >> >
> >> > This only works for arrays with uint8 data types having dimensions (x,
> >> > y, 3)
> >> > (common image format). The return array is a (16, 16, 16) equal width
> >> > bin
> >> > histogram of the input.
> >> >
> >> > If anyone wants the cython C-output, let me know and I will email it
> to
> >> > you.
> >> >
> >> > If there is interest, I will extend this for different size bins and
> >> > aliases
> >> > for different data types.
> >> >
> >> > Chris
> >> >
> >> > import numpy as np
> >> >
> >> > cimport numpy as np
> >> >
> >> > DTYPE = np.uint8
> >> > DTYPE32 = np.int
> >> >
> >> > ctypedef np.uint8_t DTYPE_t
> >> > ctypedef np.int_t DTYPE_t32
> >> >
> >> > def hist3d(np.ndarray[DTYPE_t, ndim=3] img):
> >> >     cdef int x = img.shape[0]
> >> >     cdef int y = img.shape[1]
> >> >     cdef int z = img.shape[2]
> >> >     cdef int addx
> >> >     cdef int addy
> >> >     cdef int addz
> >> >     cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16],
> >> > dtype=DTYPE32)
> >> >     cdef int i, j, v0, v1, v2
> >> >
> >> >
> >> >     for i in range(x):
> >> >         for j in range(y):
> >> >             v0 = img[i, j, 0]
> >> >             v1 = img[i, j, 1]
> >> >             v2 = img[i, j, 2]
> >> >             addx = (v0 - (v0 % 16)) / 16
> >> >             addy = (v1 - (v1 % 16)) / 16
> >> >             addz = (v2 - (v2 % 16)) / 16
> >> >             out[addx, addy, addz] += 1
> >> >
> >> >     return out
> >> >
> >>
> >> Thanks for the example for using cython. Once I figure out what the
> >> types are, cython will look very convenient for loops, and pyximport
> >> takes care of the compiler.
> >>
> >> Josef
> >>
> >> import pyximport; pyximport.install()
> >> import hist_rgb    #name of .pyx files
> >>
> >> import numpy as np
> >> factors = np.random.randint(256,size=(480, 630, 3))
> >> h = hist_rgb.hist3d(factors.astype(np.uint8))
> >> print h[:,:,0]
>
>
> playing some more with cython: here is a baby on the fly code generator
> input type int, output type float64
>
> a dispatch function by type is missing
>
> no segfaults, even though most of the time a call the function with
> the wrong type.
>
> Josef
>
> code = '''
> import numpy as np
> cimport numpy as np
>
> __all__ = ["hist3d"]
> DTYPE = ${imgtype}
> DTYPE32 = ${outtype}
>
> ctypedef ${imgtype}_t DTYPE_t
> ctypedef ${outtype}_t DTYPE_t32
>
> def hist3d(np.ndarray[DTYPE_t, ndim=3] img):
>    cdef int x = img.shape[0]
>    cdef int y = img.shape[1]
>    #cdef int z = img.shape[2]
>    cdef int addx
>    cdef int addy
>    cdef int addz
>    cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16],
> dtype=DTYPE32)
>    cdef int i, j, v0, v1, v2
>
>
>    for i in range(x):
>        for j in range(y):
>            v0 = img[i, j, 0]
>            v1 = img[i, j, 1]
>            v2 = img[i, j, 2]
>            addx = (v0 - (v0 % 16)) / 16
>            addy = (v1 - (v1 % 16)) / 16
>            addz = (v2 - (v2 % 16)) / 16
>            out[addx, addy, addz] += 1
>
>    return out
> '''
>
> from string import Template
> s = Template(code)
> src = s.substitute({'imgtype': 'np.int', 'outtype': 'np.float64'})
> open('histrgbintfl2.pyx','w').write(src)
>
> import pyximport; pyximport.install()
> import histrgbintfl2
>
> import numpy as np
> factors = np.random.randint(256,size=(480, 630, 3))
> h = histrgbintfl2.hist3d(factors)#.astype(np.uint8))
> print h[:,:,0]
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090506/07013074/attachment.html>


More information about the NumPy-Discussion mailing list