np.bincount raises MemoryError when given an empty array

Hello, Consider the following code: for j in range(5): f = np.bincount(x[y == j]) It fails with MemoryError whenever y == j is all False element-wise. In [96]: np.bincount([]) --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) /home/ernest/<ipython console> in <module>() MemoryError: In [97]: np.__version__ Out[97]: '1.3.0' Is this a bug? Bye.

2010/2/1 Ernest Adrogué <eadrogue@gmx.net>:
Hello,
Consider the following code:
for j in range(5): f = np.bincount(x[y == j])
It fails with MemoryError whenever y == j is all False element-wise.
In [96]: np.bincount([]) --------------------------------------------------------------------------- MemoryError Traceback (most recent call last)
/home/ernest/<ipython console> in <module>()
MemoryError:
In [97]: np.__version__ Out[97]: '1.3.0'
Is this a bug?
Bye.
I get it to work sometimes: $ ipython
import numpy as np np.bincount([])
MemoryError:
np.bincount(()) array([0]) np.bincount([]) array([0]) np.bincount([])
MemoryError:
np.__version__ '1.4.0rc2'

On Mon, Feb 1, 2010 at 12:09 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
2010/2/1 Ernest Adrogué <eadrogue@gmx.net>:
Hello,
Consider the following code:
for j in range(5): f = np.bincount(x[y == j])
It fails with MemoryError whenever y == j is all False element-wise.
In [96]: np.bincount([]) --------------------------------------------------------------------------- MemoryError Traceback (most recent call last)
/home/ernest/<ipython console> in <module>()
MemoryError:
In [97]: np.__version__ Out[97]: '1.3.0'
Is this a bug?
Bye.
I get it to work sometimes:
$ ipython
import numpy as np np.bincount([])
MemoryError:
np.bincount(()) array([0]) np.bincount([]) array([0]) np.bincount([])
MemoryError:
np.__version__ '1.4.0rc2'
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I don't get a memory error but the results are strange for empty
x=np.arange(5);np.bincount(x[x == 7]).shape (39672457,) (np.bincount(x[x == 7])==0).all() True
x=np.arange(5);np.bincount(x[x == 2]).shape (3,)
Josef

josef.pktd@gmail.com wrote:
On Mon, Feb 1, 2010 at 12:09 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
2010/2/1 Ernest Adrogué <eadrogue@gmx.net>:
Hello,
Consider the following code:
for j in range(5): f = np.bincount(x[y == j])
It fails with MemoryError whenever y == j is all False element-wise.
In [96]: np.bincount([]) --------------------------------------------------------------------------- MemoryError Traceback (most recent call last)
/home/ernest/<ipython console> in <module>()
MemoryError:
In [97]: np.__version__ Out[97]: '1.3.0'
Is this a bug?
Bye. I get it to work sometimes:
$ ipython
import numpy as np np.bincount([])
MemoryError:
np.bincount(()) array([0]) np.bincount([]) array([0]) np.bincount([])
MemoryError:
np.__version__ '1.4.0rc2'
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I don't get a memory error but the results are strange for empty
That may just be because you have enough memory for the (bogus) result: the value is a random memory value interpreted as an intp value, hence most likely very big on 64 bits system. It should be easy to fix, but I am not sure what is the expected result. An empty array ? David

On Mon, Feb 1, 2010 at 8:37 PM, David Cournapeau <david@silveregg.co.jp> wrote:
josef.pktd@gmail.com wrote:
On Mon, Feb 1, 2010 at 12:09 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
2010/2/1 Ernest Adrogué <eadrogue@gmx.net>:
Hello,
Consider the following code:
for j in range(5): f = np.bincount(x[y == j])
It fails with MemoryError whenever y == j is all False element-wise.
In [96]: np.bincount([]) --------------------------------------------------------------------------- MemoryError Traceback (most recent call last)
/home/ernest/<ipython console> in <module>()
MemoryError:
In [97]: np.__version__ Out[97]: '1.3.0'
Is this a bug?
Bye. I get it to work sometimes:
$ ipython
import numpy as np np.bincount([])
MemoryError:
np.bincount(()) array([0]) np.bincount([]) array([0]) np.bincount([])
MemoryError:
np.__version__ '1.4.0rc2'
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I don't get a memory error but the results are strange for empty
That may just be because you have enough memory for the (bogus) result: the value is a random memory value interpreted as an intp value, hence most likely very big on 64 bits system.
It should be easy to fix, but I am not sure what is the expected result. An empty array ?
np.bincount([]) array([0, 0, 0, ..., 0, 0, 0]) np.bincount(np.array([]).astype(int)) array([0, 0, 0, ..., 0, 0, 0]) np.bincount(()) array([0, 0, 0, ..., 0, 0, 0]) np.bincount(()).shape (41570297,)
I think this could be considered as a correct answer, the count of any integer is zero. Returning an array with one zero, or the empty array or raising an exception? I don't see much of a pattern
x=np.arange(5);np.unique(x[x == 7]) array([], dtype=int32) np.unique(x[x == 7], return_index=1) (array([], dtype=int32), array([], dtype=bool)) np.unique(x[x == 7], return_inverse=1) (array([], dtype=int32), array([], dtype=bool))
x=np.arange(5);np.histogram(x[x == 7]) Traceback (most recent call last): File "<pyshell#136>", line 1, in <module> x=np.arange(5);np.histogram(x[x == 7]) File "C:\Programs\Python25\Lib\site-packages\numpy\lib\function_base.py", line 202, in histogram range = (a.min(), a.max()) ValueError: zero-size array to ufunc.reduce without identity
x=np.arange(5);np.digitize(x[x == 7],np.arange(6)) Traceback (most recent call last): File "<pyshell#140>", line 1, in <module> x=np.arange(5);np.digitize(x[x == 7],np.arange(6)) ValueError: Both x and bins must have non-zero length
the only meaningful test cases, I can think of, work both with array([0]) or empty array
np.sum(x[x == 7]) == np.bincount(x[x == 7]).sum() True
1.*np.array([0]).astype(int) / np.sum(x[x == 7]) array([ NaN]) 1.*np.array([]).astype(int) / np.sum(x[x == 7]) array([], dtype=float64)
count = np.bincount(x[x == 7]) count[count > 0] array([], dtype=int32)
I'm slightly in favor of returning an empty array rather than array([0]) as Keith got it. Josef
David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
I think this could be considered as a correct answer, the count of any integer is zero.
Maybe, but this shape is random - it would be different in different conditions, as the length of the returned array is just some random memory location.
Returning an array with one zero, or the empty array or raising an exception? I don't see much of a pattern
Since there is no obvious solution, the only rationale for not raising an exception I could see is to accommodate often-encountered special cases. I find returning [0] more confusing than returning empty arrays, though - maybe there is a usecase I don't know about. cheers, David

On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
I think this could be considered as a correct answer, the count of any integer is zero.
Maybe, but this shape is random - it would be different in different conditions, as the length of the returned array is just some random memory location.
Returning an array with one zero, or the empty array or raising an exception? I don't see much of a pattern
Since there is no obvious solution, the only rationale for not raising an exception I could see is to accommodate often-encountered special cases. I find returning [0] more confusing than returning empty arrays, though - maybe there is a usecase I don't know about.
In this case I would expect an empty input to be a programming error and raising an error to be the right thing. Chuck

On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
I think this could be considered as a correct answer, the count of any integer is zero.
Maybe, but this shape is random - it would be different in different conditions, as the length of the returned array is just some random memory location.
Returning an array with one zero, or the empty array or raising an exception? I don't see much of a pattern
Since there is no obvious solution, the only rationale for not raising an exception I could see is to accommodate often-encountered special cases. I find returning [0] more confusing than returning empty arrays, though - maybe there is a usecase I don't know about.
In this case I would expect an empty input to be a programming error and raising an error to be the right thing.
np.sum([]) 0.0 sum([]) 0
Not necessarily, if you run the bincount over groups in a dataset and your not sure if every group is actually observed. The main question, is whether the user needs or wants to check for empty groups before or after the loop over bincount. Like the empty array or the array([0]) can be considered as the default argument. In this case it is not really a programming error. Since bincount usually returns redundant zero count unless np.unique(data) = np.arange(data.max()+1), array([0]) would also make sense as a minimum answer
np.bincount([7,8,9]) array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1])
I use bincount quite a lot but only with fixed sized arrays, so I never actually used it in this way (yet). Josef
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com>
wrote:
On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
I think this could be considered as a correct answer, the count of any integer is zero.
Maybe, but this shape is random - it would be different in different conditions, as the length of the returned array is just some random memory location.
Returning an array with one zero, or the empty array or raising an exception? I don't see much of a pattern
Since there is no obvious solution, the only rationale for not raising an exception I could see is to accommodate often-encountered special cases. I find returning [0] more confusing than returning empty arrays, though - maybe there is a usecase I don't know about.
In this case I would expect an empty input to be a programming error and raising an error to be the right thing.
Not necessarily, if you run the bincount over groups in a dataset and your not sure if every group is actually observed. The main question, is whether the user needs or wants to check for empty groups before or after the loop over bincount.
How would they know which bin to check? This seems like an unlikely way to check for an empty input.
np.sum([]) 0.0 sum([]) 0
Like the empty array or the array([0]) can be considered as the default argument. In this case it is not really a programming error.
I like that better than an empty array.
Since bincount usually returns redundant zero count unless np.unique(data) = np.arange(data.max()+1), array([0]) would also make sense as a minimum answer
np.bincount([7,8,9]) array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1])
I use bincount quite a lot but only with fixed sized arrays, so I never actually used it in this way (yet).
Chuck

On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
I think this could be considered as a correct answer, the count of any integer is zero.
Maybe, but this shape is random - it would be different in different conditions, as the length of the returned array is just some random memory location.
Returning an array with one zero, or the empty array or raising an exception? I don't see much of a pattern
Since there is no obvious solution, the only rationale for not raising an exception I could see is to accommodate often-encountered special cases. I find returning [0] more confusing than returning empty arrays, though - maybe there is a usecase I don't know about.
In this case I would expect an empty input to be a programming error and raising an error to be the right thing.
Not necessarily, if you run the bincount over groups in a dataset and your not sure if every group is actually observed. The main question, is whether the user needs or wants to check for empty groups before or after the loop over bincount.
How would they know which bin to check? This seems like an unlikely way to check for an empty input.
# grade (e.g. SAT) distribution by school and race for s in schools: for r in race: print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)]) allwhite schools and allblack schools raise an exception. I just made up the story, my first attempt was: all sectors, all firmsize groups, bincount something, will have empty cells for some size groups, e.g. nuclear power in family business. Josef
np.sum([]) 0.0 sum([]) 0
Like the empty array or the array([0]) can be considered as the default argument. In this case it is not really a programming error.
I like that better than an empty array.
Since bincount usually returns redundant zero count unless np.unique(data) = np.arange(data.max()+1), array([0]) would also make sense as a minimum answer
np.bincount([7,8,9]) array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1])
I use bincount quite a lot but only with fixed sized arrays, so I never actually used it in this way (yet).
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, Feb 2, 2010 at 12:57 AM, <josef.pktd@gmail.com> wrote:
On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
I think this could be considered as a correct answer, the count of any integer is zero.
Maybe, but this shape is random - it would be different in different conditions, as the length of the returned array is just some random memory location.
Returning an array with one zero, or the empty array or raising an exception? I don't see much of a pattern
Since there is no obvious solution, the only rationale for not raising an exception I could see is to accommodate often-encountered special cases. I find returning [0] more confusing than returning empty arrays, though - maybe there is a usecase I don't know about.
In this case I would expect an empty input to be a programming error and raising an error to be the right thing.
Not necessarily, if you run the bincount over groups in a dataset and your not sure if every group is actually observed. The main question, is whether the user needs or wants to check for empty groups before or after the loop over bincount.
How would they know which bin to check? This seems like an unlikely way to check for an empty input.
# grade (e.g. SAT) distribution by school and race for s in schools: for r in race: print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])
a = np.bincount(allstudentgrades[(sch==s)*(ra==r)]) print s, r, 100.*a /a.sum() to get distribution with empty or nan
allwhite schools and allblack schools raise an exception.
I just made up the story, my first attempt was: all sectors, all firmsize groups, bincount something, will have empty cells for some size groups, e.g. nuclear power in family business.
Josef
np.sum([]) 0.0 sum([]) 0
Like the empty array or the array([0]) can be considered as the default argument. In this case it is not really a programming error.
I like that better than an empty array.
Since bincount usually returns redundant zero count unless np.unique(data) = np.arange(data.max()+1), array([0]) would also make sense as a minimum answer
np.bincount([7,8,9]) array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1])
I use bincount quite a lot but only with fixed sized arrays, so I never actually used it in this way (yet).
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Mon, Feb 1, 2010 at 10:57 PM, <josef.pktd@gmail.com> wrote:
On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
I think this could be considered as a correct answer, the count of any integer is zero.
Maybe, but this shape is random - it would be different in different conditions, as the length of the returned array is just some random memory location.
Returning an array with one zero, or the empty array or raising an exception? I don't see much of a pattern
Since there is no obvious solution, the only rationale for not
raising
an exception I could see is to accommodate often-encountered special cases. I find returning [0] more confusing than returning empty arrays, though - maybe there is a usecase I don't know about.
In this case I would expect an empty input to be a programming error and raising an error to be the right thing.
Not necessarily, if you run the bincount over groups in a dataset and your not sure if every group is actually observed. The main question, is whether the user needs or wants to check for empty groups before or after the loop over bincount.
How would they know which bin to check? This seems like an unlikely way to check for an empty input.
# grade (e.g. SAT) distribution by school and race for s in schools: for r in race: print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])
allwhite schools and allblack schools raise an exception.
I just made up the story, my first attempt was: all sectors, all firmsize groups, bincount something, will have empty cells for some size groups, e.g. nuclear power in family business.
OK, point taken. What do you think would be the best thing to do? <snip> Chuck

2/02/10 @ 00:01 (-0700), thus spake Charles R Harris:
On Mon, Feb 1, 2010 at 10:57 PM, <josef.pktd@gmail.com> wrote:
On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote:
> I think this could be considered as a correct answer, the count of > any > integer is zero.
Maybe, but this shape is random - it would be different in different conditions, as the length of the returned array is just some random memory location.
> > Returning an array with one zero, or the empty array or raising an > exception? I don't see much of a pattern
Since there is no obvious solution, the only rationale for not
raising
an exception I could see is to accommodate often-encountered special cases. I find returning [0] more confusing than returning empty arrays, though - maybe there is a usecase I don't know about.
In this case I would expect an empty input to be a programming error and raising an error to be the right thing.
Not necessarily, if you run the bincount over groups in a dataset and your not sure if every group is actually observed. The main question, is whether the user needs or wants to check for empty groups before or after the loop over bincount.
How would they know which bin to check? This seems like an unlikely way to check for an empty input.
# grade (e.g. SAT) distribution by school and race for s in schools: for r in race: print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])
allwhite schools and allblack schools raise an exception.
I just made up the story, my first attempt was: all sectors, all firmsize groups, bincount something, will have empty cells for some size groups, e.g. nuclear power in family business.
OK, point taken. What do you think would be the best thing to do?
In my opinion, returning an empty array makes more sense than array([0]). An empty arrays means "there are no bins", whereas an array of length 1 implies that there is one. Cheers. Ernest

2010/2/2 Ernest Adrogué <eadrogue@gmx.net>:
2/02/10 @ 00:01 (-0700), thus spake Charles R Harris:
On Mon, Feb 1, 2010 at 10:57 PM, <josef.pktd@gmail.com> wrote:
On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com> wrote: > > On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote: > > > I think this could be considered as a correct answer, the count of > > any > > integer is zero. > > Maybe, but this shape is random - it would be different in different > conditions, as the length of the returned array is just some random > memory location. > > > > > Returning an array with one zero, or the empty array or raising an > > exception? I don't see much of a pattern > > Since there is no obvious solution, the only rationale for not
raising
> an exception I could see is to accommodate often-encountered special > cases. I find returning [0] more confusing than returning empty > arrays, though - maybe there is a usecase I don't know about. >
In this case I would expect an empty input to be a programming error and raising an error to be the right thing.
Not necessarily, if you run the bincount over groups in a dataset and your not sure if every group is actually observed. The main question, is whether the user needs or wants to check for empty groups before or after the loop over bincount.
How would they know which bin to check? This seems like an unlikely way to check for an empty input.
# grade (e.g. SAT) distribution by school and race for s in schools: for r in race: print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])
allwhite schools and allblack schools raise an exception.
I just made up the story, my first attempt was: all sectors, all firmsize groups, bincount something, will have empty cells for some size groups, e.g. nuclear power in family business.
OK, point taken. What do you think would be the best thing to do?
In my opinion, returning an empty array makes more sense than array([0]). An empty arrays means "there are no bins", whereas an array of length 1 implies that there is one.
Since bincount returns sometimes zero count bins, the implication is not necessarily true. But now I'm also in favor of the empty array, as a least surprise solution, and the user can decide whether, when or how to handle empty arrays. just one more example, before discovering bincount, I used histogram to count integers
npx=np.arange(5);np.histogram(x[x == 7], bins=np.arange(7+1)) (array([0, 0, 0, 0, 0, 0, 0]), array([0, 1, 2, 3, 4, 5, 6, 7])) npx=np.arange(5);np.histogram(x[x == 7], bins=[]) (array([], dtype=int32), array([], dtype=float64))
Josef
Cheers.
Ernest
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, Feb 2, 2010 at 8:53 AM, <josef.pktd@gmail.com> wrote:
2010/2/2 Ernest Adrogué <eadrogue@gmx.net>:
2/02/10 @ 00:01 (-0700), thus spake Charles R Harris:
On Mon, Feb 1, 2010 at 10:57 PM, <josef.pktd@gmail.com> wrote:
On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > > On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape@gmail.com> > wrote: >> >> On Tue, Feb 2, 2010 at 1:05 PM, <josef.pktd@gmail.com> wrote: >> >> > I think this could be considered as a correct answer, the count of >> > any >> > integer is zero. >> >> Maybe, but this shape is random - it would be different in different >> conditions, as the length of the returned array is just some random >> memory location. >> >> > >> > Returning an array with one zero, or the empty array or raising an >> > exception? I don't see much of a pattern >> >> Since there is no obvious solution, the only rationale for not
raising
>> an exception I could see is to accommodate often-encountered special >> cases. I find returning [0] more confusing than returning empty >> arrays, though - maybe there is a usecase I don't know about. >> > > In this case I would expect an empty input to be a programming error and > raising an error to be the right thing.
Not necessarily, if you run the bincount over groups in a dataset and your not sure if every group is actually observed. The main question, is whether the user needs or wants to check for empty groups before or after the loop over bincount.
How would they know which bin to check? This seems like an unlikely way to check for an empty input.
# grade (e.g. SAT) distribution by school and race for s in schools: for r in race: print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])
allwhite schools and allblack schools raise an exception.
I just made up the story, my first attempt was: all sectors, all firmsize groups, bincount something, will have empty cells for some size groups, e.g. nuclear power in family business.
OK, point taken. What do you think would be the best thing to do?
In my opinion, returning an empty array makes more sense than array([0]). An empty arrays means "there are no bins", whereas an array of length 1 implies that there is one.
Since bincount returns sometimes zero count bins, the implication is not necessarily true.
But now I'm also in favor of the empty array, as a least surprise solution, and the user can decide whether, when or how to handle empty arrays.
just one more example, before discovering bincount, I used histogram to count integers
without typo:
x=np.arange(5); np.histogram(x[x == 7], bins=np.arange(7+1)) (array([0, 0, 0, 0, 0, 0, 0]), array([0, 1, 2, 3, 4, 5, 6, 7])) x=np.arange(5); np.histogram(x[x == 7], bins=[]) (array([], dtype=int32), array([], dtype=float64))
Josef
Cheers.
Ernest
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris wrote:
In this case I would expect an empty input to be a programming error and raising an error to be the right thing.
Ok, I fixed the code in the trunk to raise a ValueError in that case. Changing to return an empty array would be easy, cheers, David
participants (6)
-
Charles R Harris
-
David Cournapeau
-
David Cournapeau
-
Ernest Adrogué
-
josef.pktd@gmail.com
-
Keith Goodman