Indexing empty dimensions with empty arrays
I have been instructed to bring this issue to the mailing list: http://projects.scipy.org/numpy/ticket/1994 TIA, - Jordi G. H.
2011/12/25 Jordi Gutiérrez Hermoso <jordigh@octave.org>
I have been instructed to bring this issue to the mailing list:
http://projects.scipy.org/numpy/ticket/1994
The issue is this corner case:
idx = [] x = np.array([]) x[idx] #works array([], dtype=float64) x[:, idx] #works array([], dtype=float64)
x = np.ones((5,0)) x[idx] #works array([], shape=(0, 0), dtype=float64) x[:, idx] #doesn't work Traceback (most recent call last): File "<ipython-input-27-7038691cb565>", line 1, in <module> x[:, idx] #doesn't work IndexError: invalid index
This is obviously inconsistent, but I think just fixing this one case is not enough; unexpected behavior with empty inputs/indexes keeps coming up. Do we need a clear set of rules that all functions follow and tests to ensure these rules are actually followed, or not? Ralf
On Mon, Dec 26, 2011 at 1:51 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
2011/12/25 Jordi Gutiérrez Hermoso <jordigh@octave.org>
I have been instructed to bring this issue to the mailing list:
The issue is this corner case:
idx = [] x = np.array([]) x[idx] #works array([], dtype=float64) x[:, idx] #works array([], dtype=float64)
x = np.ones((5,0)) x[idx] #works array([], shape=(0, 0), dtype=float64) x[:, idx] #doesn't work Traceback (most recent call last): File "<ipython-input-27-7038691cb565>", line 1, in <module> x[:, idx] #doesn't work IndexError: invalid index
This is obviously inconsistent, but I think just fixing this one case is not enough; unexpected behavior with empty inputs/indexes keeps coming up. Do we need a clear set of rules that all functions follow and tests to ensure these rules are actually followed, or not?
this works
xx = np.arange(12).reshape(3,4) xx array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) x = xx[:,xx[:,-1]<3] x array([], shape=(3, 0), dtype=int32) x<0 array([], shape=(3, 0), dtype=bool) x[x<0] array([], dtype=int32) x[:,x<0] array([], dtype=int32)
x.ndim 2
I have a hard time thinking through empty 2-dim arrays, and don't know what rules should apply. However, in my code I might want to catch these cases rather early than late and then having to work my way backwards to find out where the content disappeared. my 2c Josef
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Dec 26, 2011 at 8:50 PM, <josef.pktd@gmail.com> wrote:
On Mon, Dec 26, 2011 at 1:51 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
2011/12/25 Jordi Gutiérrez Hermoso <jordigh@octave.org>
I have been instructed to bring this issue to the mailing list:
The issue is this corner case:
idx = [] x = np.array([]) x[idx] #works array([], dtype=float64) x[:, idx] #works array([], dtype=float64)
x = np.ones((5,0)) x[idx] #works array([], shape=(0, 0), dtype=float64) x[:, idx] #doesn't work Traceback (most recent call last): File "<ipython-input-27-7038691cb565>", line 1, in <module> x[:, idx] #doesn't work IndexError: invalid index
This is obviously inconsistent, but I think just fixing this one case is
not
enough; unexpected behavior with empty inputs/indexes keeps coming up. Do we need a clear set of rules that all functions follow and tests to ensure these rules are actually followed, or not?
this works
xx = np.arange(12).reshape(3,4) xx array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) x = xx[:,xx[:,-1]<3] x array([], shape=(3, 0), dtype=int32) x<0 array([], shape=(3, 0), dtype=bool) x[x<0] array([], dtype=int32) x[:,x<0] array([], dtype=int32)
x.ndim 2
I have a hard time thinking through empty 2-dim arrays, and don't know what rules should apply. However, in my code I might want to catch these cases rather early than late and then having to work my way backwards to find out where the content disappeared.
Same here. Almost always, my empty arrays are either due to bugs or they signal that I do need to special-case something. Silent passing through of empty arrays to all numpy functions is not what I would want. Ralf
On 26 December 2011 14:56, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Mon, Dec 26, 2011 at 8:50 PM, <josef.pktd@gmail.com> wrote:
I have a hard time thinking through empty 2-dim arrays, and don't know what rules should apply. However, in my code I might want to catch these cases rather early than late and then having to work my way backwards to find out where the content disappeared.
Same here. Almost always, my empty arrays are either due to bugs or they signal that I do need to special-case something. Silent passing through of empty arrays to all numpy functions is not what I would want.
I find it quite annoying to treat the empty set with special deference. "All of my great-grandkids live in Antarctica" should be true for me (I'm only 30 years old). If you decide that is not true for me, it leads to a bunch of other logical annoyances up there The rule that shouldn't be special cased is what I described: x[idx1, idx2] should be a valid construction if it's true that all elements of idx1 and idx2 are integers in the correct range. The sizes of the empty matrices are also somewhat obvious. Special-casing vacuous truth makes me write annoying special cases. Octave doesn't error out for those special cases, and I think it's a good thing it doesn't. It's logically consistent. - Jordi G. H.
2011/12/26 Jordi Gutiérrez Hermoso <jordigh@octave.org>:
On 26 December 2011 14:56, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Mon, Dec 26, 2011 at 8:50 PM, <josef.pktd@gmail.com> wrote:
I have a hard time thinking through empty 2-dim arrays, and don't know what rules should apply. However, in my code I might want to catch these cases rather early than late and then having to work my way backwards to find out where the content disappeared.
Same here. Almost always, my empty arrays are either due to bugs or they signal that I do need to special-case something. Silent passing through of empty arrays to all numpy functions is not what I would want.
I find it quite annoying to treat the empty set with special deference. "All of my great-grandkids live in Antarctica" should be true for me (I'm only 30 years old). If you decide that is not true for me, it leads to a bunch of other logical annoyances up there
The rule that shouldn't be special cased is what I described: x[idx1, idx2] should be a valid construction if it's true that all elements of idx1 and idx2 are integers in the correct range. The sizes of the empty matrices are also somewhat obvious.
Special-casing vacuous truth makes me write annoying special cases. Octave doesn't error out for those special cases, and I think it's a good thing it doesn't. It's logically consistent.
I don't think I ever ran into an empty matrix in matlab, and wouldn't know how it behaves. But it looks like the [:, empty] is a special case that doesn't work
np.ones((3,0)) array([], shape=(3, 0), dtype=float64) np.ones((3,0))[1,[]] array([], dtype=float64) np.ones((3,0))[:,[]] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: invalid index
np.ones((3,0))[np.arange(3),[]] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shape mismatch: objects cannot be broadcast to a single shape
oops, my mistake
np.broadcast_arrays(np.arange(3)[:,None],[]) [array([], shape=(3, 0), dtype=int32), array([], shape=(3, 0), dtype=float64)] np.ones((3,0))[np.arange(3)[:,None],[]] array([], shape=(3, 0), dtype=float64) np.broadcast_arrays(np.arange(3)[:,None],[[]]) [array([], shape=(3, 0), dtype=int32), array([], shape=(3, 0), dtype=float64)]
np.ones((3,0))[np.arange(3)[:,None],[]] array([], shape=(3, 0), dtype=float64) np.ones((3,0))[np.arange(3)[:,None],[[]]] array([], shape=(3, 0), dtype=float64) np.ones((3,0))[np.arange(3)[:,None],np.array([],int)] array([], shape=(3, 0), dtype=float64)
np.take(np.ones((3,0)),[], axis=1) array([], shape=(3, 0), dtype=float64) np.take(np.ones((3,0)),[], axis=0) array([], shape=(0, 0), dtype=float64)
I would prefer consistent indexing, independent of whether I find it useful to have pages of code working with nothing. Josef I don't think a paper where the referee or editor catches the authors using assumptions that describe an empty set will ever get published (maybe with a qualifier, outside of philosophy). It might happen, though, that the empty set slips through the refereeing process.
- Jordi G. H. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 26 December 2011 19:56, <josef.pktd@gmail.com> wrote:
I don't think I ever ran into an empty matrix in matlab, and wouldn't know how it behaves.
I think they behave like Octave matrices. I'm not sure about all cases because I don't have access to Matlab, but I think Matlab handles it about as sanely as Octave: not a special case that errors out. - Jordi G. H.
2011/12/27 Jordi Gutiérrez Hermoso <jordigh@octave.org>
On 26 December 2011 14:56, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Mon, Dec 26, 2011 at 8:50 PM, <josef.pktd@gmail.com> wrote:
I have a hard time thinking through empty 2-dim arrays, and don't know what rules should apply. However, in my code I might want to catch these cases rather early than late and then having to work my way backwards to find out where the content disappeared.
Same here. Almost always, my empty arrays are either due to bugs or they signal that I do need to special-case something. Silent passing through
of
empty arrays to all numpy functions is not what I would want.
I find it quite annoying to treat the empty set with special deference. "All of my great-grandkids live in Antarctica" should be true for me (I'm only 30 years old). If you decide that is not true for me, it leads to a bunch of other logical annoyances up there
Guess you don't mean true/false, because it's neither. But I understand you want an empty array back instead of an error. Currently the problem is that when you do get that empty array back, you'll then use that for something else and it will probably still crash. Many numpy functions do not check for empty input and will still give exceptions. My impression is that you're better off handling these where you create the empty array, rather than in some random place later on. The alternative is to have consistent rules for empty arrays, and handle them explicitly in all functions. Can be done, but is of course a lot of work and has some overhead. Finally, I note that your exception only occurs for empty arrays with shape (N, 0). It's not obvious to me if the same rules should apply to shape (0,) and other shapes, or why those shapes are even useful. Ralf
The rule that shouldn't be special cased is what I described: x[idx1, idx2] should be a valid construction if it's true that all elements of idx1 and idx2 are integers in the correct range. The sizes of the empty matrices are also somewhat obvious.
Special-casing vacuous truth makes me write annoying special cases. Octave doesn't error out for those special cases, and I think it's a good thing it doesn't. It's logically consistent.
- Jordi G. H. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 12/28/2011 09:33 AM, Ralf Gommers wrote:
2011/12/27 Jordi Gutiérrez Hermoso <jordigh@octave.org <mailto:jordigh@octave.org>>
On 26 December 2011 14:56, Ralf Gommers <ralf.gommers@googlemail.com <mailto:ralf.gommers@googlemail.com>> wrote: > > > On Mon, Dec 26, 2011 at 8:50 PM, <josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>> wrote: >> I have a hard time thinking through empty 2-dim arrays, and don't know >> what rules should apply. >> However, in my code I might want to catch these cases rather early >> than late and then having to work my way backwards to find out where >> the content disappeared. > > > Same here. Almost always, my empty arrays are either due to bugs or they > signal that I do need to special-case something. Silent passing through of > empty arrays to all numpy functions is not what I would want.
I find it quite annoying to treat the empty set with special deference. "All of my great-grandkids live in Antarctica" should be true for me (I'm only 30 years old). If you decide that is not true for me, it leads to a bunch of other logical annoyances up there
Guess you don't mean true/false, because it's neither. But I understand you want an empty array back instead of an error.
Currently the problem is that when you do get that empty array back, you'll then use that for something else and it will probably still crash. Many numpy functions do not check for empty input and will still give exceptions. My impression is that you're better off handling these where you create the empty array, rather than in some random place later on. The alternative is to have consistent rules for empty arrays, and handle them explicitly in all functions. Can be done, but is of course a lot of work and has some overhead.
Are you saying that the existence of other bugs means that this bug shouldn't be fixed? I just fail to see the relevance of these other bugs to this discussion. For the record, I've encountered this bug many times myself and it's rather irritating, since it leads to more verbose code. It is useful whenever you want to return data that is a subset of the input data (since the selected subset can usually be zero-sized sometimes -- remember, in computer science the only numbers are 0, 1, and "any number"). Here's one of the examples I've had. The Interpolative Decomposition decomposes a m-by-n matrix A of rank k as A = B C where B is an m-by-k matrix consisting of a subset of the columns of A, and C is a k-by-n matrix. Now, if A is all zeros (which is often the case for me), then k is 0. I would still like to create the m-by-0 matrix B by doing B = A[:, selected_columns] But now I have to do this instead: if len(selected_columns) == 0: B = np.zeros((A.shape[0], 0), dtype=A.dtype) else: B = A[:, selected_columns] In this case, zero-sized B and C are of course perfectly valid and useful results: In [2]: np.dot(np.ones((3,0)), np.ones((0, 5))) Out[2]: array([[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]]) Dag Sverre
On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote:
On 12/28/2011 09:33 AM, Ralf Gommers wrote:
2011/12/27 Jordi Gutiérrez Hermoso<jordigh@octave.org <mailto:jordigh@octave.org>>
On 26 December 2011 14:56, Ralf Gommers<ralf.gommers@googlemail.com <mailto:ralf.gommers@googlemail.com>> wrote: > > > On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>> wrote: >> I have a hard time thinking through empty 2-dim arrays, and don't know >> what rules should apply. >> However, in my code I might want to catch these cases rather early >> than late and then having to work my way backwards to find out where >> the content disappeared. > > > Same here. Almost always, my empty arrays are either due to bugs or they > signal that I do need to special-case something. Silent passing through of > empty arrays to all numpy functions is not what I would want.
I find it quite annoying to treat the empty set with special deference. "All of my great-grandkids live in Antarctica" should be true for me (I'm only 30 years old). If you decide that is not true for me, it leads to a bunch of other logical annoyances up there
Guess you don't mean true/false, because it's neither. But I understand you want an empty array back instead of an error.
Currently the problem is that when you do get that empty array back, you'll then use that for something else and it will probably still crash. Many numpy functions do not check for empty input and will still give exceptions. My impression is that you're better off handling these where you create the empty array, rather than in some random place later on. The alternative is to have consistent rules for empty arrays, and handle them explicitly in all functions. Can be done, but is of course a lot of work and has some overhead.
Are you saying that the existence of other bugs means that this bug shouldn't be fixed? I just fail to see the relevance of these other bugs to this discussion.
For the record, I've encountered this bug many times myself and it's rather irritating, since it leads to more verbose code.
It is useful whenever you want to return data that is a subset of the input data (since the selected subset can usually be zero-sized sometimes -- remember, in computer science the only numbers are 0, 1, and "any number").
Here's one of the examples I've had. The Interpolative Decomposition decomposes a m-by-n matrix A of rank k as
A = B C
where B is an m-by-k matrix consisting of a subset of the columns of A, and C is a k-by-n matrix.
Now, if A is all zeros (which is often the case for me), then k is 0. I would still like to create the m-by-0 matrix B by doing
B = A[:, selected_columns]
But now I have to do this instead:
if len(selected_columns) == 0: B = np.zeros((A.shape[0], 0), dtype=A.dtype) else: B = A[:, selected_columns]
In this case, zero-sized B and C are of course perfectly valid and useful results:
In [2]: np.dot(np.ones((3,0)), np.ones((0, 5))) Out[2]: array([[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]])
And to answer the obvious question: Yes, this is a real usecase. It is used for something similar to image compression, where sub-sections of the images may well be all-zero and have zero rank (full story at [1]). Reading the above thread I understand Ralf's reasoning better, but really, relying on NumPy's buggy behaviour to discover bugs in user code seems like the wrong approach. Tools should be dumb unless there are good reasons to make them smart. I'd be rather irritated about my hammer if it refused to drive in nails that it decided where in the wrong spot. Dag Sverre [1] http://arxiv.org/abs/1110.4874
On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn < d.s.seljebotn@astro.uio.no> wrote:
On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote:
On 12/28/2011 09:33 AM, Ralf Gommers wrote:
2011/12/27 Jordi Gutiérrez Hermoso<jordigh@octave.org <mailto:jordigh@octave.org>>
On 26 December 2011 14:56, Ralf Gommers<
ralf.gommers@googlemail.com
<mailto:ralf.gommers@googlemail.com>> wrote: > > > On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>> wrote: >> I have a hard time thinking through empty 2-dim arrays, and don't know >> what rules should apply. >> However, in my code I might want to catch these cases rather
early
>> than late and then having to work my way backwards to find
out where
>> the content disappeared. > > > Same here. Almost always, my empty arrays are either due to
bugs
or they > signal that I do need to special-case something. Silent passing through of > empty arrays to all numpy functions is not what I would want.
I find it quite annoying to treat the empty set with special deference. "All of my great-grandkids live in Antarctica" should be true for me (I'm only 30 years old). If you decide that is not true for me, it leads to a bunch of other logical annoyances up there
Guess you don't mean true/false, because it's neither. But I understand you want an empty array back instead of an error.
Currently the problem is that when you do get that empty array back, you'll then use that for something else and it will probably still crash. Many numpy functions do not check for empty input and will still give exceptions. My impression is that you're better off handling these where you create the empty array, rather than in some random place later on. The alternative is to have consistent rules for empty arrays, and handle them explicitly in all functions. Can be done, but is of course a lot of work and has some overhead.
Are you saying that the existence of other bugs means that this bug shouldn't be fixed? I just fail to see the relevance of these other bugs to this discussion.
See below.
For the record, I've encountered this bug many times myself and it's rather irritating, since it leads to more verbose code.
It is useful whenever you want to return data that is a subset of the input data (since the selected subset can usually be zero-sized sometimes -- remember, in computer science the only numbers are 0, 1, and "any number").
Here's one of the examples I've had. The Interpolative Decomposition decomposes a m-by-n matrix A of rank k as
A = B C
where B is an m-by-k matrix consisting of a subset of the columns of A, and C is a k-by-n matrix.
Now, if A is all zeros (which is often the case for me), then k is 0. I would still like to create the m-by-0 matrix B by doing
B = A[:, selected_columns]
But now I have to do this instead:
if len(selected_columns) == 0: B = np.zeros((A.shape[0], 0), dtype=A.dtype) else: B = A[:, selected_columns]
In this case, zero-sized B and C are of course perfectly valid and useful results:
In [2]: np.dot(np.ones((3,0)), np.ones((0, 5))) Out[2]: array([[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]])
And to answer the obvious question: Yes, this is a real usecase. It is used for something similar to image compression, where sub-sections of the images may well be all-zero and have zero rank (full story at [1]).
Thanks for the example. I was a little surprised that dot works. Then I read what wikipedia had to say about empty arrays. It mentions dot like you do, and that the determinant of the 0-by-0 matrix is 1. So I try:
In [1]: a = np.zeros((0,0)) In [2]: a Out[2]: array([], shape=(0, 0), dtype=float64) In [3]: np.linalg.det(a) Parameter 4 to routine DGETRF was incorrect <segfault> Reading the above thread I understand Ralf's reasoning better, but
really, relying on NumPy's buggy behaviour to discover bugs in user code seems like the wrong approach. Tools should be dumb unless there are good reasons to make them smart. I'd be rather irritated about my hammer if it refused to drive in nails that it decided where in the wrong spot.
The point is not that we shouldn't fix it, but that it's a waste of time to fix it in only one place. I remember fixing several functions to explicitly check for empty arrays and then returning an empty array or giving a sensible error. So can you answer my question: do you think it's worth the time and computational overhead to handle empty arrays in all functions? Ralf
On 12/28/2011 02:21 PM, Ralf Gommers wrote:
On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no <mailto:d.s.seljebotn@astro.uio.no>> wrote:
On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote: > On 12/28/2011 09:33 AM, Ralf Gommers wrote: >> >> >> 2011/12/27 Jordi Gutiérrez Hermoso<jordigh@octave.org <mailto:jordigh@octave.org> >> <mailto:jordigh@octave.org <mailto:jordigh@octave.org>>> >> >> On 26 December 2011 14:56, Ralf Gommers<ralf.gommers@googlemail.com <mailto:ralf.gommers@googlemail.com> >> <mailto:ralf.gommers@googlemail.com <mailto:ralf.gommers@googlemail.com>>> wrote: >> > >> > >> > On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd@gmail.com <mailto:josef.pktd@gmail.com> >> <mailto:josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>>> wrote: >> >> I have a hard time thinking through empty 2-dim arrays, and >> don't know >> >> what rules should apply. >> >> However, in my code I might want to catch these cases rather early >> >> than late and then having to work my way backwards to find out where >> >> the content disappeared. >> > >> > >> > Same here. Almost always, my empty arrays are either due to bugs >> or they >> > signal that I do need to special-case something. Silent passing >> through of >> > empty arrays to all numpy functions is not what I would want. >> >> I find it quite annoying to treat the empty set with special >> deference. "All of my great-grandkids live in Antarctica" should be >> true for me (I'm only 30 years old). If you decide that is not true >> for me, it leads to a bunch of other logical annoyances up there >> >> >> Guess you don't mean true/false, because it's neither. But I understand >> you want an empty array back instead of an error. >> >> Currently the problem is that when you do get that empty array back, >> you'll then use that for something else and it will probably still >> crash. Many numpy functions do not check for empty input and will still >> give exceptions. My impression is that you're better off handling these >> where you create the empty array, rather than in some random place later >> on. The alternative is to have consistent rules for empty arrays, and >> handle them explicitly in all functions. Can be done, but is of course a >> lot of work and has some overhead. > > Are you saying that the existence of other bugs means that this bug > shouldn't be fixed? I just fail to see the relevance of these other bugs > to this discussion.
See below.
> For the record, I've encountered this bug many times myself and it's > rather irritating, since it leads to more verbose code. > > It is useful whenever you want to return data that is a subset of the > input data (since the selected subset can usually be zero-sized > sometimes -- remember, in computer science the only numbers are 0, 1, > and "any number"). > > Here's one of the examples I've had. The Interpolative Decomposition > decomposes a m-by-n matrix A of rank k as > > A = B C > > where B is an m-by-k matrix consisting of a subset of the columns of A, > and C is a k-by-n matrix. > > Now, if A is all zeros (which is often the case for me), then k is 0. I > would still like to create the m-by-0 matrix B by doing > > B = A[:, selected_columns] > > But now I have to do this instead: > > if len(selected_columns) == 0: > B = np.zeros((A.shape[0], 0), dtype=A.dtype) > else: > B = A[:, selected_columns] > > In this case, zero-sized B and C are of course perfectly valid and > useful results: > > In [2]: np.dot(np.ones((3,0)), np.ones((0, 5))) > Out[2]: > array([[ 0., 0., 0., 0., 0.], > [ 0., 0., 0., 0., 0.], > [ 0., 0., 0., 0., 0.]]) >
And to answer the obvious question: Yes, this is a real usecase. It is used for something similar to image compression, where sub-sections of the images may well be all-zero and have zero rank (full story at [1]).
Thanks for the example. I was a little surprised that dot works. Then I read what wikipedia had to say about empty arrays. It mentions dot like you do, and that the determinant of the 0-by-0 matrix is 1. So I try:
In [1]: a = np.zeros((0,0))
In [2]: a Out[2]: array([], shape=(0, 0), dtype=float64)
In [3]: np.linalg.det(a) Parameter 4 to routine DGETRF was incorrect <segfault>
:-) Well, a segfault is most certainly a bug, so this must be fixed one way or the other way anyway, and returning 1 seems at least as good a solution as raising an exception. Both solutions require an extra if-test.
Reading the above thread I understand Ralf's reasoning better, but really, relying on NumPy's buggy behaviour to discover bugs in user code seems like the wrong approach. Tools should be dumb unless there are good reasons to make them smart. I'd be rather irritated about my hammer if it refused to drive in nails that it decided where in the wrong spot.
The point is not that we shouldn't fix it, but that it's a waste of time to fix it in only one place. I remember fixing several functions to explicitly check for empty arrays and then returning an empty array or giving a sensible error.
So can you answer my question: do you think it's worth the time and computational overhead to handle empty arrays in all functions?
I'd hope the computational overhead is negligible? I do believe that handling this correctly everywhere is the right thing to do and would improve overall code quality (as witnessed by the segfault found above). Of course, likely nobody is ready to actually perform all that work. So the right thing to do seems to be to state that places where NumPy does not handle zero-size arrays is a bug, but not do anything about it until somebody actually submits a patch. That means, ending this email discussion by verifying that this is indeed a bug on Trac, and then wait and see if anybody bothers to submit a patch. Dag Sverre
On Wed, Dec 28, 2011 at 2:45 PM, Dag Sverre Seljebotn < d.s.seljebotn@astro.uio.no> wrote:
On 12/28/2011 02:21 PM, Ralf Gommers wrote:
On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no <mailto:d.s.seljebotn@astro.uio.no>> wrote:
On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote: > On 12/28/2011 09:33 AM, Ralf Gommers wrote: >> >> >> 2011/12/27 Jordi Gutiérrez Hermoso<jordigh@octave.org <mailto:jordigh@octave.org> >> <mailto:jordigh@octave.org <mailto:jordigh@octave.org>>> >> >> On 26 December 2011 14:56, Ralf Gommers<ralf.gommers@googlemail.com <mailto:
ralf.gommers@googlemail.com>
>> <mailto:ralf.gommers@googlemail.com <mailto:ralf.gommers@googlemail.com>>> wrote: >> > >> > >> > On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd@gmail.com <mailto:josef.pktd@gmail.com> >> <mailto:josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>>>
wrote:
>> >> I have a hard time thinking through empty 2-dim arrays, and >> don't know >> >> what rules should apply. >> >> However, in my code I might want to catch these cases rather early >> >> than late and then having to work my way backwards to find out where >> >> the content disappeared. >> > >> > >> > Same here. Almost always, my empty arrays are either due to
bugs
>> or they >> > signal that I do need to special-case something. Silent
passing
>> through of >> > empty arrays to all numpy functions is not what I would want. >> >> I find it quite annoying to treat the empty set with special >> deference. "All of my great-grandkids live in Antarctica" should be >> true for me (I'm only 30 years old). If you decide that is not true >> for me, it leads to a bunch of other logical annoyances up there >> >> >> Guess you don't mean true/false, because it's neither. But I understand >> you want an empty array back instead of an error. >> >> Currently the problem is that when you do get that empty array
back,
>> you'll then use that for something else and it will probably
still
>> crash. Many numpy functions do not check for empty input and will still >> give exceptions. My impression is that you're better off handling these >> where you create the empty array, rather than in some random place later >> on. The alternative is to have consistent rules for empty arrays, and >> handle them explicitly in all functions. Can be done, but is of course a >> lot of work and has some overhead. > > Are you saying that the existence of other bugs means that this
bug
> shouldn't be fixed? I just fail to see the relevance of these other bugs > to this discussion.
See below.
> For the record, I've encountered this bug many times myself and
it's
> rather irritating, since it leads to more verbose code. > > It is useful whenever you want to return data that is a subset of
the
> input data (since the selected subset can usually be zero-sized > sometimes -- remember, in computer science the only numbers are
0, 1,
> and "any number"). > > Here's one of the examples I've had. The Interpolative
Decomposition
> decomposes a m-by-n matrix A of rank k as > > A = B C > > where B is an m-by-k matrix consisting of a subset of the columns of A, > and C is a k-by-n matrix. > > Now, if A is all zeros (which is often the case for me), then k is 0. I > would still like to create the m-by-0 matrix B by doing > > B = A[:, selected_columns] > > But now I have to do this instead: > > if len(selected_columns) == 0: > B = np.zeros((A.shape[0], 0), dtype=A.dtype) > else: > B = A[:, selected_columns] > > In this case, zero-sized B and C are of course perfectly valid and > useful results: > > In [2]: np.dot(np.ones((3,0)), np.ones((0, 5))) > Out[2]: > array([[ 0., 0., 0., 0., 0.], > [ 0., 0., 0., 0., 0.], > [ 0., 0., 0., 0., 0.]]) >
And to answer the obvious question: Yes, this is a real usecase. It
is
used for something similar to image compression, where sub-sections
of
the images may well be all-zero and have zero rank (full story at
[1]).
Thanks for the example. I was a little surprised that dot works. Then I read what wikipedia had to say about empty arrays. It mentions dot like you do, and that the determinant of the 0-by-0 matrix is 1. So I try:
In [1]: a = np.zeros((0,0))
In [2]: a Out[2]: array([], shape=(0, 0), dtype=float64)
In [3]: np.linalg.det(a) Parameter 4 to routine DGETRF was incorrect <segfault>
:-)
Well, a segfault is most certainly a bug, so this must be fixed one way or the other way anyway, and returning 1 seems at least as good a solution as raising an exception. Both solutions require an extra if-test.
Reading the above thread I understand Ralf's reasoning better, but really, relying on NumPy's buggy behaviour to discover bugs in user
code
seems like the wrong approach. Tools should be dumb unless there are good reasons to make them smart. I'd be rather irritated about my
hammer
if it refused to drive in nails that it decided where in the wrong
spot.
The point is not that we shouldn't fix it, but that it's a waste of time to fix it in only one place. I remember fixing several functions to explicitly check for empty arrays and then returning an empty array or giving a sensible error.
So can you answer my question: do you think it's worth the time and computational overhead to handle empty arrays in all functions?
I'd hope the computational overhead is negligible?
If you have to check all array_like inputs in all functions, I wouldn't think so.
I do believe that handling this correctly everywhere is the right thing to do and would improve overall code quality (as witnessed by the segfault found above).
Of course, likely nobody is ready to actually perform all that work. So the right thing to do seems to be to state that places where NumPy does not handle zero-size arrays is a bug, but not do anything about it until somebody actually submits a patch. That means, ending this email discussion by verifying that this is indeed a bug on Trac, and then wait and see if anybody bothers to submit a patch.
Agreed. I've created http://projects.scipy.org/numpy/ticket/2007 Ralf
I agree with Dag, NumPy should provide consistent handling of empty arrays. It does require some work, but it should be at least declared a bug when it doesn't. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Dec 28, 2011, at 7:45 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 12/28/2011 02:21 PM, Ralf Gommers wrote:
On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no <mailto:d.s.seljebotn@astro.uio.no>> wrote:
On 12/28/2011 09:33 AM, Ralf Gommers wrote:
2011/12/27 Jordi Gutiérrez Hermoso<jordigh@octave.org
<mailto:jordigh@octave.org>
<mailto:jordigh@octave.org <mailto:jordigh@octave.org>>>
On 26 December 2011 14:56, Ralf Gommers<ralf.gommers@googlemail.com <mailto:ralf.gommers@googlemail.com> <mailto:ralf.gommers@googlemail.com <mailto:ralf.gommers@googlemail.com>>> wrote:
On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd@gmail.com
<mailto:josef.pktd@gmail.com> <mailto:josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>>> wrote:
I have a hard time thinking through empty 2-dim arrays, and don't know what rules should apply. However, in my code I might want to catch these cases rather early than late and then having to work my way backwards to find out where the content disappeared.
Same here. Almost always, my empty arrays are either due to bugs or they signal that I do need to special-case something. Silent passing through of empty arrays to all numpy functions is not what I would want.
I find it quite annoying to treat the empty set with special deference. "All of my great-grandkids live in Antarctica" should be true for me (I'm only 30 years old). If you decide that is not true for me, it leads to a bunch of other logical annoyances up
Guess you don't mean true/false, because it's neither. But I
understand
you want an empty array back instead of an error.
Currently the problem is that when you do get that empty array back, you'll then use that for something else and it will probably still crash. Many numpy functions do not check for empty input and will still give exceptions. My impression is that you're better off handling these where you create the empty array, rather than in some random
On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote: there place later
on. The alternative is to have consistent rules for empty arrays, and handle them explicitly in all functions. Can be done, but is of course a lot of work and has some overhead.
Are you saying that the existence of other bugs means that this bug shouldn't be fixed? I just fail to see the relevance of these other bugs to this discussion.
See below.
For the record, I've encountered this bug many times myself and it's rather irritating, since it leads to more verbose code.
It is useful whenever you want to return data that is a subset of the input data (since the selected subset can usually be zero-sized sometimes -- remember, in computer science the only numbers are 0, 1, and "any number").
Here's one of the examples I've had. The Interpolative Decomposition decomposes a m-by-n matrix A of rank k as
A = B C
where B is an m-by-k matrix consisting of a subset of the columns of A, and C is a k-by-n matrix.
Now, if A is all zeros (which is often the case for me), then k is 0. I would still like to create the m-by-0 matrix B by doing
B = A[:, selected_columns]
But now I have to do this instead:
if len(selected_columns) == 0: B = np.zeros((A.shape[0], 0), dtype=A.dtype) else: B = A[:, selected_columns]
In this case, zero-sized B and C are of course perfectly valid and useful results:
In [2]: np.dot(np.ones((3,0)), np.ones((0, 5))) Out[2]: array([[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]])
And to answer the obvious question: Yes, this is a real usecase. It is used for something similar to image compression, where sub-sections of the images may well be all-zero and have zero rank (full story at [1]).
Thanks for the example. I was a little surprised that dot works. Then I read what wikipedia had to say about empty arrays. It mentions dot like you do, and that the determinant of the 0-by-0 matrix is 1. So I try:
In [1]: a = np.zeros((0,0))
In [2]: a Out[2]: array([], shape=(0, 0), dtype=float64)
In [3]: np.linalg.det(a) Parameter 4 to routine DGETRF was incorrect <segfault>
:-)
Well, a segfault is most certainly a bug, so this must be fixed one way or the other way anyway, and returning 1 seems at least as good a solution as raising an exception. Both solutions require an extra if-test.
Reading the above thread I understand Ralf's reasoning better, but really, relying on NumPy's buggy behaviour to discover bugs in user code seems like the wrong approach. Tools should be dumb unless there are good reasons to make them smart. I'd be rather irritated about my hammer if it refused to drive in nails that it decided where in the wrong spot.
The point is not that we shouldn't fix it, but that it's a waste of time to fix it in only one place. I remember fixing several functions to explicitly check for empty arrays and then returning an empty array or giving a sensible error.
So can you answer my question: do you think it's worth the time and computational overhead to handle empty arrays in all functions?
I'd hope the computational overhead is negligible?
I do believe that handling this correctly everywhere is the right thing to do and would improve overall code quality (as witnessed by the segfault found above).
Of course, likely nobody is ready to actually perform all that work. So the right thing to do seems to be to state that places where NumPy does not handle zero-size arrays is a bug, but not do anything about it until somebody actually submits a patch. That means, ending this email discussion by verifying that this is indeed a bug on Trac, and then wait and see if anybody bothers to submit a patch.
Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 28 December 2011 03:33, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
2011/12/27 Jordi Gutiérrez Hermoso <jordigh@octave.org>
On 26 December 2011 14:56, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Mon, Dec 26, 2011 at 8:50 PM, <josef.pktd@gmail.com> wrote:
I have a hard time thinking through empty 2-dim arrays, and don't know what rules should apply. However, in my code I might want to catch these cases rather early than late and then having to work my way backwards to find out where the content disappeared.
Same here. Almost always, my empty arrays are either due to bugs or they signal that I do need to special-case something. Silent passing through of empty arrays to all numpy functions is not what I would want.
I find it quite annoying to treat the empty set with special deference. "All of my great-grandkids live in Antarctica" should be true for me (I'm only 30 years old). If you decide that is not true for me, it leads to a bunch of other logical annoyances up there
Guess you don't mean true/false, because it's neither. But I understand you want an empty array back instead of an error.
It should be true. This is a case of vacuous truth: http://en.wikipedia.org/wiki/Vacuous_truth - Jordi G. H.
participants (5)
-
Dag Sverre Seljebotn -
Jordi Gutiérrez Hermoso -
josef.pktd@gmail.com -
Ralf Gommers -
Travis Oliphant