Mailman 3 Why are empty arrays False? - NumPy-Discussion

Why are empty arrays False?

Michael Lamparski

Aug. 18, 2017

9:45 p.m.

Greetings, all. I am troubled. The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...

...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other. The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1) The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless. So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/! Here's what I would like to see:

...

...
...
bool(np.array([])) ValueError: The truth value of a non-scalar array is ambiguous. Use a.any() or a.all()

Why do I care? Well, I myself wasted an hour barking up the wrong tree while debugging some code when it turned out that I was mistakenly using truthiness to identify empty arrays. It just so happened that the arrays always contained 1 or 0 elements, so it /appeared/ to work except in the rare case of array([0]) where things suddenly exploded. I posit that there is no usage of the fact that `bool(array([])) is False` in any real-world code which is not accompanied by a horrible bug writhing in hiding just beneath the surface. For this reason, I wish to see this behavior *abolished*. Thank you. -Michael Footnotes: 1: Every now and then, I wish that `ndarray.__{bool,nonzero}__` would just implicitly do `all()`, which would make `if a == b:` work like it does for virtually every other reasonably-designed type in existence. But then I recall that, if this were done, then the behavior of `if a != b:` would stand out like a sore thumb instead. Truly, punting on 'any/all' was the right choice. 2: np.array([[[[]]]]) is also False, which makes this an interesting sort of n-dimensional emptiness test; but if that's really what you're looking for, you can achieve this much more safely with `np.all(x.shape)` or `bool(x.flat)`

Attachments:

attachment.htm (text/html — 3.2 KB)

Show replies by date

Stephan Hoyer

August 2017

10 p.m.

I agree, this behavior seems actively harmful. Let's fix it. On Fri, Aug 18, 2017 at 2:45 PM, Michael Lamparski <diagonaldevice@gmail.com

...

wrote:

...

Greetings, all. I am troubled.

The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...
...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other.

The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1)

The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless.

So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/!

Here's what I would like to see:

...
...
...
bool(np.array([])) ValueError: The truth value of a non-scalar array is ambiguous. Use a.any() or a.all()

Why do I care? Well, I myself wasted an hour barking up the wrong tree while debugging some code when it turned out that I was mistakenly using truthiness to identify empty arrays. It just so happened that the arrays always contained 1 or 0 elements, so it /appeared/ to work except in the rare case of array([0]) where things suddenly exploded.

I posit that there is no usage of the fact that `bool(array([])) is False` in any real-world code which is not accompanied by a horrible bug writhing in hiding just beneath the surface. For this reason, I wish to see this behavior *abolished*.

Thank you. -Michael

Footnotes: 1: Every now and then, I wish that `ndarray.__{bool,nonzero}__` would just implicitly do `all()`, which would make `if a == b:` work like it does for virtually every other reasonably-designed type in existence. But then I recall that, if this were done, then the behavior of `if a != b:` would stand out like a sore thumb instead. Truly, punting on 'any/all' was the right choice.

2: np.array([[[[]]]]) is also False, which makes this an interesting sort of n-dimensional emptiness test; but if that's really what you're looking for, you can achieve this much more safely with `np.all(x.shape)` or `bool(x.flat)`

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Paul Hobson

10:37 p.m.

Maybe I'm missing something. This seems fine to me:

...

...
...
bool(np.array([])) False

But I would have expected these to raise ValueErrors recommending any() and all():

...

...
...
bool(np.array([1])) True bool(np.array([0])) False

On Fri, Aug 18, 2017 at 3:00 PM, Stephan Hoyer <shoyer@gmail.com> wrote:

...

I agree, this behavior seems actively harmful. Let's fix it.

On Fri, Aug 18, 2017 at 2:45 PM, Michael Lamparski < diagonaldevice@gmail.com> wrote:

...
Greetings, all. I am troubled.

The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...
...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other.

The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1)

The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless.

So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/!

Here's what I would like to see:

...
...
...
bool(np.array([])) ValueError: The truth value of a non-scalar array is ambiguous. Use a.any() or a.all()

Why do I care? Well, I myself wasted an hour barking up the wrong tree while debugging some code when it turned out that I was mistakenly using truthiness to identify empty arrays. It just so happened that the arrays always contained 1 or 0 elements, so it /appeared/ to work except in the rare case of array([0]) where things suddenly exploded.

I posit that there is no usage of the fact that `bool(array([])) is False` in any real-world code which is not accompanied by a horrible bug writhing in hiding just beneath the surface. For this reason, I wish to see this behavior *abolished*.

Thank you. -Michael

Footnotes: 1: Every now and then, I wish that `ndarray.__{bool,nonzero}__` would just implicitly do `all()`, which would make `if a == b:` work like it does for virtually every other reasonably-designed type in existence. But then I recall that, if this were done, then the behavior of `if a != b:` would stand out like a sore thumb instead. Truly, punting on 'any/all' was the right choice.

2: np.array([[[[]]]]) is also False, which makes this an interesting sort of n-dimensional emptiness test; but if that's really what you're looking for, you can achieve this much more safely with `np.all(x.shape)` or `bool(x.flat)`

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Michael Lamparski

11:02 p.m.

...

But I would have expected these to raise ValueErrors recommending any() and all():

...
...
...
bool(np.array([1])) True bool(np.array([0])) False

While I can't confess to know the *actual* reason why single-element arrays evaluate the way they do, this is how I understand it: One thing that single-element arrays have going for them is that, for arrays like this, `x.any() == x.all()`. Hence, in these cases, there is no ambiguity. In this same light, we can see yet another argument against bool(np.array([])), because guess what: This one IS ambiguous!

...

...
...
np.array([]).any() False np.array([]).all() True

On Fri, Aug 18, 2017 at 6:37 PM, Paul Hobson <pmhobson@gmail.com> wrote:

...

Maybe I'm missing something.

This seems fine to me:

...
...
...
bool(np.array([])) False

But I would have expected these to raise ValueErrors recommending any() and all():

...
...
...
bool(np.array([1])) True bool(np.array([0])) False

On Fri, Aug 18, 2017 at 3:00 PM, Stephan Hoyer <shoyer@gmail.com> wrote:

...
I agree, this behavior seems actively harmful. Let's fix it.

On Fri, Aug 18, 2017 at 2:45 PM, Michael Lamparski < diagonaldevice@gmail.com> wrote:

...
Greetings, all. I am troubled.

The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...
...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other.

The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1)

The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless.

So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/!

Here's what I would like to see:

...
...
...
bool(np.array([])) ValueError: The truth value of a non-scalar array is ambiguous. Use a.any() or a.all()

Why do I care? Well, I myself wasted an hour barking up the wrong tree while debugging some code when it turned out that I was mistakenly using truthiness to identify empty arrays. It just so happened that the arrays always contained 1 or 0 elements, so it /appeared/ to work except in the rare case of array([0]) where things suddenly exploded.

I posit that there is no usage of the fact that `bool(array([])) is False` in any real-world code which is not accompanied by a horrible bug writhing in hiding just beneath the surface. For this reason, I wish to see this behavior *abolished*.

Thank you. -Michael

Footnotes: 1: Every now and then, I wish that `ndarray.__{bool,nonzero}__` would just implicitly do `all()`, which would make `if a == b:` work like it does for virtually every other reasonably-designed type in existence. But then I recall that, if this were done, then the behavior of `if a != b:` would stand out like a sore thumb instead. Truly, punting on 'any/all' was the right choice.

2: np.array([[[[]]]]) is also False, which makes this an interesting sort of n-dimensional emptiness test; but if that's really what you're looking for, you can achieve this much more safely with `np.all(x.shape)` or `bool(x.flat)`

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Eric Wieser

11:07 p.m.

I'm also in favor of fixing this, although we might need a deprecation cycle with a warning advising to use arr.size in future to detect emptiness - just in case anyone is using it. On Sat, Aug 19, 2017, 06:01 Stephan Hoyer <shoyer@gmail.com> wrote:

...

I agree, this behavior seems actively harmful. Let's fix it.

On Fri, Aug 18, 2017 at 2:45 PM, Michael Lamparski < diagonaldevice@gmail.com> wrote:

...
Greetings, all. I am troubled.

The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...
...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other.

The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1)

The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless.

So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/!

Here's what I would like to see:

...
...
...
bool(np.array([])) ValueError: The truth value of a non-scalar array is ambiguous. Use a.any() or a.all()

Why do I care? Well, I myself wasted an hour barking up the wrong tree while debugging some code when it turned out that I was mistakenly using truthiness to identify empty arrays. It just so happened that the arrays always contained 1 or 0 elements, so it /appeared/ to work except in the rare case of array([0]) where things suddenly exploded.

I posit that there is no usage of the fact that `bool(array([])) is False` in any real-world code which is not accompanied by a horrible bug writhing in hiding just beneath the surface. For this reason, I wish to see this behavior *abolished*.

Thank you. -Michael

Footnotes: 1: Every now and then, I wish that `ndarray.__{bool,nonzero}__` would just implicitly do `all()`, which would make `if a == b:` work like it does for virtually every other reasonably-designed type in existence. But then I recall that, if this were done, then the behavior of `if a != b:` would stand out like a sore thumb instead. Truly, punting on 'any/all' was the right choice.

2: np.array([[[[]]]]) is also False, which makes this an interesting sort of n-dimensional emptiness test; but if that's really what you're looking for, you can achieve this much more safely with `np.all(x.shape)` or `bool(x.flat)`

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Nathaniel Smith

12:12 a.m.

On Fri, Aug 18, 2017 at 2:45 PM, Michael Lamparski <diagonaldevice@gmail.com> wrote:

...

Greetings, all. I am troubled.

The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...
...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other.

The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1)

The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless.

So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/!

Yeah, numpy tries to follow Python conventions, except sometimes you run into these cases where it's trying to simultaneously follow two incompatible extensions and things get... problematic.

...

Here's what I would like to see:

...
...
...
bool(np.array([])) ValueError: The truth value of a non-scalar array is ambiguous. Use a.any() or a.all()

Why do I care? Well, I myself wasted an hour barking up the wrong tree while debugging some code when it turned out that I was mistakenly using truthiness to identify empty arrays. It just so happened that the arrays always contained 1 or 0 elements, so it /appeared/ to work except in the rare case of array([0]) where things suddenly exploded.

Yeah, we should probably deprecate and remove this (though it will take some time).

...

2: np.array([[[[]]]]) is also False, which makes this an interesting sort of n-dimensional emptiness test; but if that's really what you're looking for, you can achieve this much more safely with `np.all(x.shape)` or `bool(x.flat)`

x.size is also useful for emptiness checking. -n -- Nathaniel J. Smith -- https://vorpus.org

Eric Firing

2:34 a.m.

On 2017/08/18 11:45 AM, Michael Lamparski wrote:

...

Greetings, all. I am troubled.

The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...
...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other.

The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1)

The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless.

So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/!

I don't agree. I think the consistency between bool([]) and bool(array([])) is worth preserving. Nothing you have shown is inconsistent with "Falseness is emptiness", which is quite fundamental in Python. The inconsistency is in distinguishing between 1 element and more than one element. To be consistent, bool(array([0])) and bool(array([0, 1])) should both be True. Contrary to the ValueError message, there need be no ambiguity, any more than there is an ambiguity in bool([1, 2]). Eric

...

Here's what I would like to see:

...
...
...
bool(np.array([])) ValueError: The truth value of a non-scalar array is ambiguous. Use a.any() or a.all()

Why do I care? Well, I myself wasted an hour barking up the wrong tree while debugging some code when it turned out that I was mistakenly using truthiness to identify empty arrays. It just so happened that the arrays always contained 1 or 0 elements, so it /appeared/ to work except in the rare case of array([0]) where things suddenly exploded.

I posit that there is no usage of the fact that `bool(array([])) is False` in any real-world code which is not accompanied by a horrible bug writhing in hiding just beneath the surface. For this reason, I wish to see this behavior *abolished*.

Thank you. -Michael

Footnotes: 1: Every now and then, I wish that `ndarray.__{bool,nonzero}__` would just implicitly do `all()`, which would make `if a == b:` work like it does for virtually every other reasonably-designed type in existence. But then I recall that, if this were done, then the behavior of `if a != b:` would stand out like a sore thumb instead. Truly, punting on 'any/all' was the right choice.

2: np.array([[[[]]]]) is also False, which makes this an interesting sort of n-dimensional emptiness test; but if that's really what you're looking for, you can achieve this much more safely with `np.all(x.shape)` or `bool(x.flat)`

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Eric Wieser

4:19 a.m.

Defining falseness as emptiness in numpy is problematic, as then bool(array(0)) and bool(0) would have different results. 0d arrays are supposed to behave as much like their scalar values as possible, so this is not acceptable. More importantly though, allowing your proposed semantics would cause a lot of silent bugs in code like `if arr == value`, which would be silently true of array inputs. We already diverge from python on what == means, so I see no reason to match the normal semantics of bool. I'd be tentatively in favor of deprecating bool(array([1]) with a warning asking for `.squeeze()` to be used, since this also hides a (smaller) class of bugs. On Sat, Aug 19, 2017, 10:34 Eric Firing <efiring@hawaii.edu> wrote:

...

On 2017/08/18 11:45 AM, Michael Lamparski wrote:

...
Greetings, all. I am troubled.

The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...
...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other.

The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1)

The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless.

So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/!

I don't agree. I think the consistency between bool([]) and bool(array([])) is worth preserving. Nothing you have shown is inconsistent with "Falseness is emptiness", which is quite fundamental in Python. The inconsistency is in distinguishing between 1 element and more than one element. To be consistent, bool(array([0])) and bool(array([0, 1])) should both be True. Contrary to the ValueError message, there need be no ambiguity, any more than there is an ambiguity in bool([1, 2]).

Eric

...
Here's what I would like to see:

...
...
...
bool(np.array([])) ValueError: The truth value of a non-scalar array is ambiguous. Use a.any() or a.all()

Why do I care? Well, I myself wasted an hour barking up the wrong tree while debugging some code when it turned out that I was mistakenly using truthiness to identify empty arrays. It just so happened that the arrays always contained 1 or 0 elements, so it /appeared/ to work except in the rare case of array([0]) where things suddenly exploded.

I posit that there is no usage of the fact that `bool(array([])) is False` in any real-world code which is not accompanied by a horrible bug writhing in hiding just beneath the surface. For this reason, I wish to see this behavior *abolished*.

Thank you. -Michael

Footnotes: 1: Every now and then, I wish that `ndarray.__{bool,nonzero}__` would just implicitly do `all()`, which would make `if a == b:` work like it does for virtually every other reasonably-designed type in existence. But then I recall that, if this were done, then the behavior of `if a != b:` would stand out like a sore thumb instead. Truly, punting on 'any/all' was the right choice.

2: np.array([[[[]]]]) is also False, which makes this an interesting sort of n-dimensional emptiness test; but if that's really what you're looking for, you can achieve this much more safely with `np.all(x.shape)` or `bool(x.flat)`

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Michael Lamparski

5:04 a.m.

...

More importantly though, allowing your proposed semantics would cause a lot of silent bugs in code like `if arr == value`, which would be silently true of array inputs. We already diverge from python on what == means, so I see no reason to match the normal semantics of bool.

Eric hits the nail right on the head here. (er, ahh, you're both Eric!) And this gets worse; not only would `a == b` be true, but so would `a != b`! For the vast majority of arrays, `bool(x != x)` would be True! I can resonate with Eric F's feelings, because to be honest, I've never been a big fan of the fact that comparison operators return arrays in the first place. That said... it's a difficult design question, and I can respect the decision that was made; there certainly are a large variety of circumstances where broadcasting these operations are useful. On the other hand, it is a decision that comes with implications that cannot be ignored in many other parts of the library, and truthiness of arrays is one of them.

...

I'd be tentatively in favor of deprecating bool(array([1]) with a warning asking for `.squeeze()` to be used, since this also hides a (smaller) class of bugs.

I can get behind this as well, though I just keep wondering in the back of my mind whether there's some tricky but legitimate use case that I'm not thinking about, where arrays of size 1 just happen to have a natural tendency to arise. On Sat, Aug 19, 2017, 10:34 Eric Firing <efiring@hawaii.edu> wrote:

...

On 2017/08/18 11:45 AM, Michael Lamparski wrote:

...
Greetings, all. I am troubled.

The TL;DR is that `bool(array([])) is False` is misleading, dangerous, and unnecessary. Let's begin with some examples:

...
...
...
bool(np.array(1)) True bool(np.array(0)) False bool(np.array([0, 1])) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() bool(np.array([1])) True bool(np.array([0])) False bool(np.array([])) False

One of these things is not like the other.

The first three results embody a design that is consistent with some of the most fundamental design choices in numpy, such as the choice to have comparison operators like `==` work elementwise. And it is the only such design I can think of that is consistent in all edge cases. (see footnote 1)

The next two examples (involving arrays of shape (1,)) are a straightforward extension of the design to arrays that are isomorphic to scalars. I can't say I recall ever finding a use for this feature... but it seems fairly harmless.

So how about that last example, with array([])? Well... it's /kind of/ like how other python containers work, right? Falseness is emptiness (see footnote 2)... Except that this is actually *a complete lie*, due to /all of the other examples above/!

I don't agree. I think the consistency between bool([]) and bool(array([])) is worth preserving. Nothing you have shown is inconsistent with "Falseness is emptiness", which is quite fundamental in Python. The inconsistency is in distinguishing between 1 element and more than one element. To be consistent, bool(array([0])) and bool(array([0, 1])) should both be True. Contrary to the ValueError message, there need be no ambiguity, any more than there is an ambiguity in bool([1, 2]).

Eric _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Pauli Virtanen

1:22 p.m.

Michael Lamparski kirjoitti 19.08.2017 klo 07:04:

...

...
I'd be tentatively in favor of deprecating bool(array([1]) with a warning asking for `.squeeze()` to be used, since this also hides a (smaller) class of bugs.

I can get behind this as well, though I just keep wondering in the back of my mind whether there's some tricky but legitimate use case that I'm not thinking about, where arrays of size 1 just happen to have a natural tendency to arise.

Changing this sort of fundamental semantics (i.e. size-1 arrays behave like scalars in bool, int, etc. casting context) this late in the game in my opinion should be discussed with more care. While the intention of making it harder to write code with bugs is good, it should not come at the cost of having everyone fix their old scripts, which worked correctly previously, but then suddenly stop working. Note also that I expect polling on this mailing list will not reach the majority of the user base, so I would suggest being very conservative when deprecating features that are not wrong but only with suboptimal semantics. This sort of backward-incompatible changes accumulate, and will lead to rotting of third-party code. -- Pauli Virtanen

Michael Lamparski

5:18 p.m.

On Sat, Aug 19, 2017 at 9:22 AM, Pauli Virtanen <pav@iki.fi> wrote:

...

While the intention of making it harder to write code with bugs is good, it should not come at the cost of having everyone fix their old scripts, which worked correctly previously, but then suddenly stop working.

This is a good point. Deprecating anything in such a widely used library has a very big cost that must be weighed against the benefits, and I agree that truth-testing on size=1 arrays is neither broken nor dangerous. IMO, it is a small refactoring hazard at worst.

...

Note also that I expect polling on this mailing list will not reach the majority of the user base, [...]

Yep. This thread was really just to test the waters. While there's no way to really reach out to the silent majority, I am going to at least make a github issue and summarize the points from this discussion there. I'm glad to see that the general response so far has been that this seems actionable (specifically, deprecating __nonzero__ on size=0 arrays).

Eric Firing

6 p.m.

On 2017/08/19 7:18 AM, Michael Lamparski wrote:

...

While there's no way to really reach out to the silent majority, I am going to at least make a github issue and summarize the points from this discussion there. I'm glad to see that the general response so far has been that this seems actionable (specifically, deprecating __nonzero__ on size=0 arrays).

No, that is the response you agree with; I don't think is fair to characterize it as the "general response".

Michael Lamparski

8:26 p.m.

On Sat, Aug 19, 2017 at 2:00 PM, Eric Firing <efiring@hawaii.edu> wrote:

...

On 2017/08/19 7:18 AM, Michael Lamparski wrote:

...
While there's no way to really reach out to the silent majority, I am going to at least make a github issue and summarize the points from this discussion there. I'm glad to see that the general response so far has been that this seems actionable (specifically, deprecating __nonzero__ on size=0 arrays).

No, that is the response you agree with; I don't think is fair to characterize it as the "general response".

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

With regards to gauging "general response," all I'm really trying to do is gauge the likelihood of my issue getting closed right away without action, if I were to file one (e.g. has this issue already been discussed, with a decision to leave things as they are?), because I don't want to waste my time and others' by creating an issue for something that is never going to happen. I've gotten the impression from this conversation that this change (specifically for size=0) *is* possible, especially since two people with a decent history of contribution to the numpy repository have voiced approval for the change. As I see it, opening an issue will at least invite some more discussion, and at best motivate a change. To me, that is a "generally positive response." --- ...but there's also more to it beyond the "general response." From your words, I get the impression that you believe that I am simply ignoring your comments or do not value them, simply because they go against mine. Please understand: I *don't* enjoy the fact that truthness of numpy arrays works differently from lists! And there's plenty else that I don't enjoy about numpy, too; I *don't* enjoy the fact that I need to change a whole bunch of `assert a == b` statements to `assert (a == b).all()` after changing the type of some tuple to an array. I *don't* enjoy how numpy's auto-magical shape-finding makes it nearly impossible to have an array of heterogenous tuples. But over the years, I've also put considerable amount of time and thought into understanding *why* these design choices were made. Library design is a difficult beast. Every design decision you make can interact in unexpected ways with all of your other decisions, and eventually you have to accept the fact that you can't always have your cake and eat it too. And desigining a library like numpy, the library to end all libraries for working with numerical data? That is h-a-r-d HARD. That borders on programming-language-design hard. The fact of the matter is that *I agree with you.* Truthiness SHOULD denote emptiness for python types....but I have already considered this, and weighed it against every other design consideration that came to mind. In the end, those other design considerations won out, and "scalar evaluation/any()/all()" is the lesser of two evils. To convince me personally, you need to start by presenting something novel that I haven't thought about. There will be opportunity for others to do the same on Github. Please; I live for discussions about pitfalls in language and library design! -Michael

Eric Firing

8:36 p.m.

On 2017/08/19 10:26 AM, Michael Lamparski wrote:

...

There will be opportunity for others to do the same on Github. Please; I live for discussions about pitfalls in language and library design!

Thank you for your thoughtful discussion. Eric

Marten van Kerkwijk

10:05 p.m.

Agreed with Eric Wieser here have an empty array test as `False` is less than useless, since a non-empty array either returns something based on its contents or an error. This means that one cannot write statements like `if array:`. Does this leave any use case? It seems to me it just shows there is no point in defining the truthiness of an empty array. -- Marten

Benjamin Root

2:34 p.m.

I've long ago stopped doing any "emptiness is false"-type tests on any python containers when iterators and generators became common, because they always return True. Ben On Sat, Aug 19, 2017 at 6:05 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Agreed with Eric Wieser here have an empty array test as `False` is less than useless, since a non-empty array either returns something based on its contents or an error. This means that one cannot write statements like `if array:`. Does this leave any use case? It seems to me it just shows there is no point in defining the truthiness of an empty array. -- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Chris Barker

4:31 p.m.

On Mon, Aug 21, 2017 at 7:34 AM, Benjamin Root <ben.v.root@gmail.com> wrote:

...

I've long ago stopped doing any "emptiness is false"-type tests on any python containers when iterators and generators became common, because they always return True.

good point. Personally, I've thought for years that Python's "Truthiness" concept is a wart. Sure, empty sequences, and zero values are often "False" in nature, but truthiness really is application-dependent -- in particular, sometimes a value of zero is meaningful, and sometimes not. Is it really so hard to write: if len(seq) == 0: or if x == 0: or if arr.size == 0: or arr.shape == (0,0): And then you are being far more explicit about what the test really is. And thanks Ben, for pointing out the issue with iterables. One more example of how Python has really changed its focus: Python 2 (or maybe, Python1.5) was all about sequences. Python 3 is all about iterables -- and the "empty is False" concept does not map well to iterables.... As to the topic at hand, if we had it to do again, I would NOT make an array that happens to hold a single value act like a scalar for bool() -- a 1-D array that happens to be length-1 really is a different beast than a scalar. But we don't have it to do again -- so we probably need to keep it as it is for backward compatibility. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Michael Lamparski

6:04 p.m.

On Tue, Aug 22, 2017 at 12:31 PM, Chris Barker <chris.barker@noaa.gov> wrote:

...

Personally, I've thought for years that Python's "Truthiness" concept is a wart. Sure, empty sequences, and zero values are often "False" in nature, but truthiness really is application-dependent -- in particular, sometimes a value of zero is meaningful, and sometimes not.

I think truthiness is easily a wart in any dynamically-typed language (and yet ironically, every language I can think of that has truthiness is dynamically typed except for C++). And yet for some reason it seems to be pressed forward as idiomatic in python, and for that reason alone, I use it. These are questions I ask myself on a daily basis, just to support this strange idiom: - How close to the public API is this argument? - Is '' a reasonable value for this string? - How about an empty tuple? Empty set? - Should this sentinel value be None or a new object()? - Is this list local to this function? - Is the type of this optional argument always True? - How liable are these answers to change with future refactoring? which seems like a pretty big laundry list to keep in check for what's supposed to be syntactic sugar. In the end, I will admit that I think my code "looks nice," but I think that's only because I've gotten used to seeing it! After answering all of these questions I tend to find that truthiness is seldom usable in any sort of generic code. These are the kinds of places where I usually find myself using truthiness instead, and all involve working with objects of known type: # 1. A list used as a stack while stack: top = stack.pop() ... def read_config(d): # 2. Empty default value for a mutable argument that I don't mutate d = dict(d or {}) a = d.pop('a') b = d.pop('b') ... # 3. Validating configuration if d: warn('unrecognized config keys: {!r}'.format(list(d))) # 4. Oddball cases, e.g. the "linked list" (a, (b, (c, (d, (e, None))))) def iter_linked_list(node): while node: value, node = node yield value # 5. ...more oddball stuff... def format_call(f, *args, **kw): arg_s = ', '.join(repr(x) for x in args) kw_s = ', '.join(f'{k:!s}={v:!r}' for k,v in kw.items()) sep = ', ' if args and kw else '' return f'{f.__name__}({arg_s}{sep}{kw_s})' Meanwhile, for an arbitrary iterator taken as an argument, if you want it to have at least one element for some reason, then good luck; truthiness will not help you.

Chris Barker

9:48 p.m.

On Tue, Aug 22, 2017 at 11:04 AM, Michael Lamparski < diagonaldevice@gmail.com> wrote:

...

I think truthiness is easily a wart in any dynamically-typed language (and yet ironically, every language I can think of that has truthiness is dynamically typed except for C++). And yet for some reason it seems to be pressed forward as idiomatic in python, and for that reason alone, I use it.

me too :-)

...

Meanwhile, for an arbitrary iterator taken as an argument, if you want it to have at least one element for some reason, then good luck; truthiness will not help you.

of course, nor will len() And this is mostly OK, as if you are taking an aritrary iterable, then you are probably going to, well, iterate over it, and: for this in an_empty_iterable: ... works fine. But bringing it back OT -- it's all a bit messy, but there is logic for the existing conventions in numpy -- and I think backward compatibility is more important than a slightly cleaner API. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nathaniel Smith

9:24 p.m.

On Fri, Aug 18, 2017 at 7:34 PM, Eric Firing <efiring@hawaii.edu> wrote:

...

I don't agree. I think the consistency between bool([]) and bool(array([])) is worth preserving. Nothing you have shown is inconsistent with "Falseness is emptiness", which is quite fundamental in Python. The inconsistency is in distinguishing between 1 element and more than one element. To be consistent, bool(array([0])) and bool(array([0, 1])) should both be True. Contrary to the ValueError message, there need be no ambiguity, any more than there is an ambiguity in bool([1, 2]).

Yeah, this is a mess. But we're definitely not going to make bool(array([0])) be True. That would break tons of code that currently relies on the current behavior. And the current behavior does make sense, in every case except empty arrays: bool broadcasts over the array, and then, oh shoot, Python requires that bool's return value be a scalar, so if this results in anything besides an array of size 1, raise an error. OTOH you can't really write code that depends on using the current bool(array([])) semantics for emptiness checking, unless the only two cases you care about are "empty" and "non-empty with exactly one element and that element is truthy". So it's much less likely that changing that will break existing code, plus any code that does break was already likely broken in subtle ways. The consistency-with-Python argument cuts two ways: if an array is a container, then for consistency bool should do emptiness checking. If an array is a bunch of scalars with broadcasting, then for consistency bool should do truthiness checking on the individual elements and raise an error on any array with size != 1. So we can't just rely on consistency-with-Python to resolve the argument -- we need to pick one :-). Though internal consistency within numpy would argue for the latter option, because numpy almost always prefers the bag-of-scalars semantics over the container semantics, e.g. for + and *, like Eric Wieser mentioned. Though there are exceptions like iteration. ...Though actually, iteration and indexing by scalars tries to be consistent with Python in yet a third way. They pretend that an array is a unidimensional container holding a bunch of arrays: In [3]: np.array([[1]])[0] Out[3]: array([1]) In [4]: next(iter(np.array([[1]]))) Out[4]: array([1]) So according to this model, bool(np.array([])) should be False, but bool(np.array([[]])) should be True (note that with lists, bool([[]]) is True). But alas: In [5]: bool(np.array([])), bool(np.array([[]])) Out[5]: (False, False) -n -- Nathaniel J. Smith -- https://vorpus.org

2720

Age (days ago)

2724

Last active (days ago)

List overview

Download

19 comments

10 participants

participants (10)

Benjamin Root
Chris Barker
Eric Firing
Eric Wieser
Marten van Kerkwijk
Michael Lamparski
Nathaniel Smith
Paul Hobson
Pauli Virtanen
Stephan Hoyer

Why are empty arrays False?

Marten van Kerkwijk

tags

participants (10)