Mailman 3 Obscure code in concatenate code path? - NumPy-Discussion

Obscure code in concatenate code path?

Matthew Brett

Sept. 13, 2012

6:12 a.m.

Hi, While writing some tests for np.concatenate, I ran foul of this code: if (axis >= NPY_MAXDIMS) { ret = PyArray_ConcatenateFlattenedArrays(narrays, arrays, NPY_CORDER); } else { ret = PyArray_ConcatenateArrays(narrays, arrays, axis); } in multiarraymodule.c So, if the user passes an axis >= (by default) 32 the arrays to concatenate get flattened, and must all have the same number of elements (it turns out). This seems obscure. Obviously it is not likely that someone would pass in an axis no >= 32 by accident, but if they did, they would not get the result they expect. Is there some code-path that needs this? Is there another way of doing it? Best, Matthew

Show replies by date

Nathaniel Smith

September 2012

9:40 a.m.

On Thu, Sep 13, 2012 at 11:12 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

While writing some tests for np.concatenate, I ran foul of this code:

if (axis >= NPY_MAXDIMS) { ret = PyArray_ConcatenateFlattenedArrays(narrays, arrays, NPY_CORDER); } else { ret = PyArray_ConcatenateArrays(narrays, arrays, axis); }

in multiarraymodule.c

How deeply weird.

...

So, if the user passes an axis >= (by default) 32 the arrays to concatenate get flattened, and must all have the same number of elements (it turns out). This seems obscure. Obviously it is not likely that someone would pass in an axis no >= 32 by accident, but if they did, they would not get the result they expect. Is there some code-path that needs this? Is there another way of doing it?

This behaviour seems to be older -- I can reproduce it empirically with 1.6.2. But the actual code you encountered was introduced along with PyArray_ConcatenateFlattenedArrays itself by Mark Wiebe in 9194b3af. So @Mark, you were the last one to look at this closely, any thoughts? :-) Though, in 1.6.2, there doesn't seem to be any requirement that the arrays have the same length: In [11]: np.concatenate(([[1, 2]], [[3]]), axis=100) Out[11]: array([1, 2, 3]) My first guess is that this was some ill-advised "defensive programming" thing where someone wanted to do *something* with these weird axis arguments, and picked *something* at more-or-less random. I like that theory better than the one where someone introduced this on purpose and then used it... It might even be that rare case where the best solution is to just rip it out and see if anyone notices. -n

Travis Oliphant

10:01 a.m.

On Sep 13, 2012, at 8:40 AM, Nathaniel Smith wrote:

...

On Thu, Sep 13, 2012 at 11:12 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

While writing some tests for np.concatenate, I ran foul of this code:

if (axis >= NPY_MAXDIMS) { ret = PyArray_ConcatenateFlattenedArrays(narrays, arrays, NPY_CORDER); } else { ret = PyArray_ConcatenateArrays(narrays, arrays, axis); }

in multiarraymodule.c

How deeply weird

This is expected behavior. It's how the concatenate Python function manages to handle axis=None to flatten the arrays before concatenation. This has been in NumPy since 1.0 and should not be changed without deprecation warnings which I am -0 on. Now, it is true that the C-API could have been written differently (I think this is what Mark was trying to encourage) so that there are two C-API functions and they are dispatched separately from the array_concatenate method depending on whether or not a None is passed in. But, the behavior is documented and has been for a long time. Reference PyArray_AxisConverter (which turns a "None" Python argument into an axis=MAX_DIMS). This is consistent behavior throughout the C-API. -Travis

Warren Weckesser

12:39 p.m.

On Thu, Sep 13, 2012 at 9:01 AM, Travis Oliphant <travis@continuum.io>wrote:

...

On Sep 13, 2012, at 8:40 AM, Nathaniel Smith wrote:

...
On Thu, Sep 13, 2012 at 11:12 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

While writing some tests for np.concatenate, I ran foul of this code:

if (axis >= NPY_MAXDIMS) { ret = PyArray_ConcatenateFlattenedArrays(narrays, arrays, NPY_CORDER); } else { ret = PyArray_ConcatenateArrays(narrays, arrays, axis); }

in multiarraymodule.c

How deeply weird

This is expected behavior.

Heh, I guess "expected" is subjective: In [23]: np.__version__ Out[23]: '1.6.1' In [24]: a = zeros((2,2)) In [25]: b = ones((2,3)) In [26]: concatenate((a, b), axis=0) # Expected error. --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/warren/gitwork/class-material/demo/pytables/<ipython-input-26-7cefb735e507> in <module>() ----> 1 concatenate((a, b), axis=0) # Expected error. ValueError: array dimensions must agree except for d_0 In [27]: concatenate((a, b), axis=1) # Normal behavior. Out[27]: array([[ 0., 0., 1., 1., 1.], [ 0., 0., 1., 1., 1.]]) In [28]: concatenate((a, b), axis=2) # Cryptic error message. --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/warren/gitwork/class-material/demo/pytables/<ipython-input-28-0bce84c34ef1> in <module>() ----> 1 concatenate((a, b), axis=2) # Cryptic error message. ValueError: bad axis1 argument to swapaxes In [29]: concatenate((a, b), axis=32) # What the... ? Out[29]: array([ 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]) I would expect an error, consistent with the behavior when 1 < axis < 32. Warren

...

It's how the concatenate Python function manages to handle axis=None to flatten the arrays before concatenation. This has been in NumPy since 1.0 and should not be changed without deprecation warnings which I am -0 on.

Now, it is true that the C-API could have been written differently (I think this is what Mark was trying to encourage) so that there are two C-API functions and they are dispatched separately from the array_concatenate method depending on whether or not a None is passed in. But, the behavior is documented and has been for a long time.

Reference PyArray_AxisConverter (which turns a "None" Python argument into an axis=MAX_DIMS). This is consistent behavior throughout the C-API.

-Travis

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

12:51 p.m.

On Sep 13, 2012, at 11:39 AM, Warren Weckesser wrote:

...

On Thu, Sep 13, 2012 at 9:01 AM, Travis Oliphant <travis@continuum.io> wrote:

On Sep 13, 2012, at 8:40 AM, Nathaniel Smith wrote:

...
On Thu, Sep 13, 2012 at 11:12 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

While writing some tests for np.concatenate, I ran foul of this code:

if (axis >= NPY_MAXDIMS) { ret = PyArray_ConcatenateFlattenedArrays(narrays, arrays, NPY_CORDER); } else { ret = PyArray_ConcatenateArrays(narrays, arrays, axis); }

in multiarraymodule.c

How deeply weird

This is expected behavior.

Heh, I guess "expected" is subjective:

"Expected" only in the sense that the current C-API has been intentional for 6 years. The side-effect of the Python-side being confusing can be changed --- it just hasn't been yet --- the documented approach is to use None. A patch to PyArray_AxisConverter might be the answer. -Travis

...

In [23]: np.__version__ Out[23]: '1.6.1'

In [24]: a = zeros((2,2))

In [25]: b = ones((2,3))

In [26]: concatenate((a, b), axis=0) # Expected error. --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/warren/gitwork/class-material/demo/pytables/<ipython-input-26-7cefb735e507> in <module>() ----> 1 concatenate((a, b), axis=0) # Expected error.

ValueError: array dimensions must agree except for d_0

In [27]: concatenate((a, b), axis=1) # Normal behavior. Out[27]: array([[ 0., 0., 1., 1., 1.], [ 0., 0., 1., 1., 1.]])

In [28]: concatenate((a, b), axis=2) # Cryptic error message. --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/warren/gitwork/class-material/demo/pytables/<ipython-input-28-0bce84c34ef1> in <module>() ----> 1 concatenate((a, b), axis=2) # Cryptic error message.

ValueError: bad axis1 argument to swapaxes

In [29]: concatenate((a, b), axis=32) # What the... ? Out[29]: array([ 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.])

I would expect an error, consistent with the behavior when 1 < axis < 32.

Warren

It's how the concatenate Python function manages to handle axis=None to flatten the arrays before concatenation. This has been in NumPy since 1.0 and should not be changed without deprecation warnings which I am -0 on.

Now, it is true that the C-API could have been written differently (I think this is what Mark was trying to encourage) so that there are two C-API functions and they are dispatched separately from the array_concatenate method depending on whether or not a None is passed in. But, the behavior is documented and has been for a long time.

Reference PyArray_AxisConverter (which turns a "None" Python argument into an axis=MAX_DIMS). This is consistent behavior throughout the C-API.

-Travis

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Daπid

12:57 p.m.

On Thu, Sep 13, 2012 at 6:39 PM, Warren Weckesser <warren.weckesser@enthought.com> wrote:

...

I would expect an error, consistent with the behavior when 1 < axis < 32.

In that case, you are hitting the dimension limit. np.concatenate((a,b), axis=31) ValueError: bad axis1 argument to swapaxes Where axis=32, axis=3500, axis=None all return the flattened array. I have been trying with other functions and got something interesting. With the same a, b as before: np.sum((a,b), axis=0) ValueError: operands could not be broadcast together with shapes (2) (3) np.sum((a,b), axis=1) array([[ 0. 0.], [ 2. 2. 2.]], dtype=object) np.sum((a,b), axis=2) ValueError: axis(=2) out of bounds This is to be expected, but now this is not consistent: np.sum((a,b), axis=32) ValueError: operands could not be broadcast together with shapes (2) (3) np.sum((a,b), axis=500) ValueError: axis(=500) out of bounds

Matthew Brett

1:34 p.m.

Hi, On Thu, Sep 13, 2012 at 3:01 PM, Travis Oliphant <travis@continuum.io> wrote:

...

On Sep 13, 2012, at 8:40 AM, Nathaniel Smith wrote:

...
On Thu, Sep 13, 2012 at 11:12 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

...
Hi,

While writing some tests for np.concatenate, I ran foul of this code:

if (axis >= NPY_MAXDIMS) { ret = PyArray_ConcatenateFlattenedArrays(narrays, arrays, NPY_CORDER); } else { ret = PyArray_ConcatenateArrays(narrays, arrays, axis); }

in multiarraymodule.c

How deeply weird

This is expected behavior. It's how the concatenate Python function manages to handle axis=None to flatten the arrays before concatenation. This has been in NumPy since 1.0 and should not be changed without deprecation warnings which I am -0 on.

Now, it is true that the C-API could have been written differently (I think this is what Mark was trying to encourage) so that there are two C-API functions and they are dispatched separately from the array_concatenate method depending on whether or not a None is passed in. But, the behavior is documented and has been for a long time.

Reference PyArray_AxisConverter (which turns a "None" Python argument into an axis=MAX_DIMS). This is consistent behavior throughout the C-API.

How about something like: #define NPY_NONE_AXIS NPY_MAXDIMS to make it clearer what is intended? Best, Matthew

Travis Oliphant

1:35 p.m.

...

...
This is expected behavior. It's how the concatenate Python function manages to handle axis=None to flatten the arrays before concatenation. This has been in NumPy since 1.0 and should not be changed without deprecation warnings which I am -0 on.

Now, it is true that the C-API could have been written differently (I think this is what Mark was trying to encourage) so that there are two C-API functions and they are dispatched separately from the array_concatenate method depending on whether or not a None is passed in. But, the behavior is documented and has been for a long time.

Reference PyArray_AxisConverter (which turns a "None" Python argument into an axis=MAX_DIMS). This is consistent behavior throughout the C-API.

How about something like:

#define NPY_NONE_AXIS NPY_MAXDIMS

to make it clearer what is intended?

+1 -Travis

4564

Age (days ago)

4564

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

Daπid
Matthew Brett
Nathaniel Smith
Travis Oliphant
Warren Weckesser

Obscure code in concatenate code path?

tags

participants (5)