[Numpy-discussion] nditer: possible to manually handle dimensions with different lengths?

Thu Oct 6 13:24:04 EDT 2011

I ended up fixing my problem by removing the 'buffering' flag and adding the
'copy' flag to each of the input arrays.

I think that nested_iters might be improved by an operand axes specification
for each layer of nesting like nditer uses, though I suppose that 3 layers
of nesting might be confusing for users.

I get an "array too big error" on "values, groups, outs = it1.itviews" when
the shape of the iterator is larger than ~(4728, 125285) even if each of the
arrays should only have actual size along one dimension.

Code:

@cython.boundscheck(False)
def groupwise_accumulate(vals, group, axis ):
cdef np.ndarray[double, ndim = 2] values
cdef np.ndarray[long, ndim = 2] groups
cdef np.ndarray[double, ndim = 2] outs
cdef int size
 cdef long g
 cdef int i, j
 #copy so that swaping the axis doesn't mess up the original arrays
vals = vals.copy()
group = group.copy()
 #add a dimension to match up with the new dimension and swap the given axis
to the end
vals.shape = vals.shape + (1,)
vaxes = range(vals.ndim)
vaxes.append(axis)
vaxes.remove(axis)
vals = np.transpose(vals, vaxes)
vals = vals.copy()

#the output should have the same shape as the values except along the
#last two axes (which are the given axis and the new axis)
oshape = list(vals.shape)
bins = len(np.unique(group))
oshape[-1] = 1
oshape[-2] = bins
out = np.empty(tuple(oshape))
 #line up grouping with the given axis
gshape = [1] * (len(oshape) - 1) + [vals.shape[-1]]
group.shape = gshape
 #nested iterator should go along the last two axes
axes = range(vals.ndim)
axes0 = axes[:-2]
axes1 = axes[-2:]
 it0, it1 = np.nested_iters([vals,group, out],
                     [axes0, axes1],
                     op_dtypes=['float64', 'int32', 'float64'],
                     op_flags = [['readonly', 'copy'], ['readonly','copy'],
['readwrite']],
                     flags = ['multi_index', 'reduce_ok' ])
size = vals.shape[-1]
 for x in it0:
values, groups, outs = it1.itviews

i = 0
j = 0
while i < size:
 g = groups[0,i]
#accumulation initialization
 while i < size and groups[0,i] == g:
#groupwise accumulation
i += 1
 outs[j,0] = calculation()
j += 1
 #swap back the new axis to the original location of the given axis
out.shape = out.shape[:-1]
oaxes = range(vals.ndim -1)
oaxes.insert(axis, out.ndim-1)
oaxes = oaxes[:-1] #remove the now reduced original given axis
out = np.transpose(out, oaxes)
 return out

On Mon, Oct 3, 2011 at 2:03 PM, John Salvatier <jsalvati at u.washington.edu>wrote:

> Some observations and questions about nested_iters. Nested_iters seems to
> require that all input arrays have the same number of dimensions (so you
> will have to pad some input shapes with 1s). Is there a way to specify how
> the axes line are matched together like for nditer?
>
>
> When I try to run the following program,
>
> @cython.boundscheck(False)
> def vars(vals, group, axis ):
> cdef np.ndarray[double, ndim = 2] values
> cdef np.ndarray[long long, ndim = 2] groups
>  cdef np.ndarray[double, ndim = 2] outs
> cdef int size
> cdef double value
>  cdef int i, j
>  cdef long long cgroup
> cdef double min
> cdef double max
>  cdef double open
>  oshape = list(vals.shape)
>  bins = len(np.unique(group))
> oshape = oshape+[bins]
> oshape[axis] = 1
>  out = np.empty(tuple(oshape))
>  axes = range(vals.ndim)
> axes.remove(axis)
>  gshape = [1] * len(oshape)
> gshape[axis] = len(group)
> group.shape = gshape
>  vals = vals[...,np.newaxis]
>  it0, it1 = np.nested_iters([vals,group, out],
>  [axes, [axis,len(oshape) -1]],
>  op_dtypes=['float64', 'int64', 'float64'],
>  flags = ['multi_index', 'buffered'])
> size = vals.shape[axis]
>  for x in it0:
> values, groups, outs = it1.itviews
>
> j = -1
>  for i in range(size):
> if cgroup != groups[i,0]:
> if j != -1:
>  outs[0,j] = garmanklass(open, values[i,0], min, max)
>  cgroup = groups[i,0]
>  min = inf
> max = -inf
> open = values[i,0]
>  j += 1
>
>  min = fmin(min, values[i,0])
> max = fmax(max, values[i,0])
>
>  outs[0,j+1] = garmanklass(open, values[size -1], min, max)
> return out
>
>
> I get an error
>
>   File "comp.pyx", line 58, in varscale.comp.vars (varscale\comp.c:1565)
>     values, groups, outs = it1.itviews
> ValueError: cannot provide an iterator view when buffering is enabled
>
>
> Which I am not sure how to deal with. Any advice?
>
> What I am trying to do here is to do a "grouped" calculation (the group
> specified by the group argument) on the values along the given axis. I try
> to use nested_iter to iterate over the specified axis and a new axis (the
> length of the number of groups) separately so I can do my calculation.
>
> On Mon, Oct 3, 2011 at 9:03 AM, John Salvatier <jsalvati at u.washington.edu>wrote:
>
>> Thanks mark! I think that's exactly what I'm looking for. We even had a
>> previous discussion about this (oops!) (
>> http://mail.scipy.org/pipermail/numpy-discussion/2011-January/054421.html
>> ).
>>
>> I didn't find any documentation, I will try to add some once I understand
>> how it works better.
>>
>> John
>>
>>
>> On Sat, Oct 1, 2011 at 2:53 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>>
>>> On Sat, Oct 1, 2011 at 1:45 PM, John Salvatier <
>>> jsalvati at u.washington.edu> wrote:
>>>
>>>> I apologize, I picked a poor example of what I want to do. Your
>>>> suggestion would work for the example I provided, but not for a more complex
>>>> example. My actual task is something like a "group by" operation along a
>>>> particular axis (with a known number of groups).
>>>>
>>>> Let me try again: What I would like to be able to do is to specify some
>>>> of the iterator dimensions to be handled manually by me. For example lets
>>>> say I have some kind of a 2d smoothing algorithm. If I start with an array
>>>> of shape [a,b,c,d] and I'd like to do the 2d smoothing over the 2nd and 3rd
>>>> dimensions, I'd like to be able to tell nditer to do normal broadcasting and
>>>> iteration over the 1st and 4th dimensions but leave iteration over the 2nd
>>>> and 3rd dimensions to me and my algorithm. Each iteration of nditer would
>>>> give me a 2d array to which I apply my algorithm. This way I could write
>>>> more arbitrary functions that operate on arrays and support broadcasting.
>>>>
>>>> Is clearer?
>>>>
>>>
>>> Maybe this will work for you:
>>>
>>> In [15]: a = np.arange(2*3*4*5).reshape(2,3,4,5)
>>>
>>> In [16]: it0, it1 = np.nested_iters(a, [[0,3], [1,2]],
>>> flags=['multi_index'])
>>>
>>> In [17]: for x in it0:
>>>    ....:     print it1.itviews[0]
>>>    ....:
>>> [[ 0  5 10 15]
>>>  [20 25 30 35]
>>>  [40 45 50 55]]
>>> [[ 1  6 11 16]
>>>  [21 26 31 36]
>>>  [41 46 51 56]]
>>> [[ 2  7 12 17]
>>>  [22 27 32 37]
>>>  [42 47 52 57]]
>>> [[ 3  8 13 18]
>>>  [23 28 33 38]
>>>  [43 48 53 58]]
>>> [[ 4  9 14 19]
>>>  [24 29 34 39]
>>>  [44 49 54 59]]
>>> [[ 60  65  70  75]
>>>  [ 80  85  90  95]
>>>  [100 105 110 115]]
>>> [[ 61  66  71  76]
>>>  [ 81  86  91  96]
>>>  [101 106 111 116]]
>>> [[ 62  67  72  77]
>>>  [ 82  87  92  97]
>>>  [102 107 112 117]]
>>> [[ 63  68  73  78]
>>>  [ 83  88  93  98]
>>>  [103 108 113 118]]
>>> [[ 64  69  74  79]
>>>  [ 84  89  94  99]
>>>  [104 109 114 119]]
>>>
>>> Cheers,
>>> Mark
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>> On Fri, Sep 30, 2011 at 5:04 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>>>>
>>>>> On Fri, Sep 30, 2011 at 8:03 AM, John Salvatier <
>>>>> jsalvati at u.washington.edu> wrote:
>>>>>
>>>>>> Using nditer, is it possible to manually handle dimensions  with
>>>>>> different lengths?
>>>>>>
>>>>>> For example, lets say I had an array A[5, 100] and I wanted to sample
>>>>>> every 10 along the second axis so I would end up with an array B[5,10]. Is
>>>>>> it possible to do this with nditer, handling the iteration over the second
>>>>>> axis manually of course (probably in cython)?
>>>>>>
>>>>>> I want something like this (modified from
>>>>>> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#putting-the-inner-loop-in-cython
>>>>>> )
>>>>>>
>>>>>> @cython.boundscheck(False)
>>>>>> def sum_squares_cy(arr):
>>>>>>     cdef np.ndarray[double] x
>>>>>>     cdef np.ndarray[double] y
>>>>>>     cdef int size
>>>>>>     cdef double value
>>>>>>     cdef int j
>>>>>>
>>>>>>     axeslist = list(arr.shape)
>>>>>>     axeslist[1] = -1
>>>>>>
>>>>>>     out = zeros((arr.shape[0], 10))
>>>>>>     it = np.nditer([arr, out], flags=['reduce_ok', 'external_loop',
>>>>>>                                       'buffered', 'delay_bufalloc'],
>>>>>>                 op_flags=[['readonly'], ['readwrite',
>>>>>> 'no_broadcast']],
>>>>>>                 op_axes=[None, axeslist],
>>>>>>                 op_dtypes=['float64', 'float64'])
>>>>>>     it.operands[1][...] = 0
>>>>>>     it.reset()
>>>>>>     for xarr, yarr in it:
>>>>>>         x = xarr
>>>>>>         y = yarr
>>>>>>         size = x.shape[0]
>>>>>>         j = 0
>>>>>>         for i in range(size):
>>>>>>            #some magic here involving indexing into x[i] and y[j]
>>>>>>     return it.operands[1]
>>>>>>
>>>>>> Does this make sense? Is it possible to do?
>>>>>>
>>>>>
>>>>>  I'm not sure I understand precisely what you're asking. Maybe you
>>>>> could reshape A to have shape [5, 10, 10], so that one of those 10's can
>>>>> match up with the 10 in B, perhaps with the op_axes?
>>>>>
>>>>> -Mark
>>>>>
>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> NumPy-Discussion mailing list
>>>>>> NumPy-Discussion at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111006/810cced0/attachment.html>