[Numpy-discussion] np.bincount raises MemoryError when given an empty array

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Feb 2 08:53:48 EST 2010


2010/2/2 Ernest Adrogué <eadrogue at gmx.net>:
>  2/02/10 @ 00:01 (-0700), thus spake Charles R Harris:
>> On Mon, Feb 1, 2010 at 10:57 PM, <josef.pktd at gmail.com> wrote:
>>
>> > On Tue, Feb 2, 2010 at 12:31 AM, Charles R Harris
>> > <charlesr.harris at gmail.com> wrote:
>> > >
>> > >
>> > > On Mon, Feb 1, 2010 at 10:02 PM, <josef.pktd at gmail.com> wrote:
>> > >>
>> > >> On Mon, Feb 1, 2010 at 11:45 PM, Charles R Harris
>> > >> <charlesr.harris at gmail.com> wrote:
>> > >> >
>> > >> >
>> > >> > On Mon, Feb 1, 2010 at 9:36 PM, David Cournapeau <cournape at gmail.com>
>> > >> > wrote:
>> > >> >>
>> > >> >> On Tue, Feb 2, 2010 at 1:05 PM,  <josef.pktd at gmail.com> wrote:
>> > >> >>
>> > >> >> > I think this could be considered as a correct answer, the count of
>> > >> >> > any
>> > >> >> > integer is zero.
>> > >> >>
>> > >> >> Maybe, but this shape is random - it would be different in different
>> > >> >> conditions, as the length of the returned array is just some random
>> > >> >> memory location.
>> > >> >>
>> > >> >> >
>> > >> >> > Returning an array with one zero, or the empty array or raising an
>> > >> >> > exception? I don't see much of a pattern
>> > >> >>
>> > >> >> Since there is no obvious solution, the only rationale for not
>> > raising
>> > >> >> an exception  I could see is to accommodate often-encountered special
>> > >> >> cases. I find returning [0] more confusing than returning empty
>> > >> >> arrays, though - maybe there is a usecase I don't know about.
>> > >> >>
>> > >> >
>> > >> > In this case I would expect an empty input to be a programming error
>> > and
>> > >> > raising an error to be the right thing.
>> > >>
>> > >> Not necessarily, if you run the bincount over groups in a dataset and
>> > >> your not sure if every group is actually observed. The main question,
>> > >> is whether the user needs or wants to check for empty groups before or
>> > >> after the loop over bincount.
>> > >>
>> > >
>> > > How would they know which bin to check? This seems like an unlikely way
>> > to
>> > > check for an empty input.
>> >
>> > # grade (e.g. SAT) distribution by school and race
>> > for s in schools:
>> >    for r in race:
>> >      print s, r, np.bincount(allstudentgrades[(sch==s)*(ra==r)])
>> >
>> > allwhite schools and allblack schools raise an exception.
>> >
>> > I just made up the story, my first attempt was: all sectors, all
>> > firmsize groups, bincount something, will have empty cells for some
>> > size groups, e.g. nuclear power in family business.
>> >
>> >
>> OK, point taken. What do you think would be the best thing to do?
>
> In my opinion, returning an empty array makes more sense than
> array([0]). An empty arrays means "there are no bins", whereas
> an array of length 1 implies that there is one.

Since bincount returns sometimes zero count bins, the implication is
not necessarily true.

But now I'm also in favor of the empty array, as a least surprise
solution, and the user can decide whether, when or how to handle empty
arrays.


just one more example, before discovering bincount, I used histogram
to count integers

>>> npx=np.arange(5);np.histogram(x[x == 7], bins=np.arange(7+1))
(array([0, 0, 0, 0, 0, 0, 0]), array([0, 1, 2, 3, 4, 5, 6, 7]))
>>> npx=np.arange(5);np.histogram(x[x == 7], bins=[])
(array([], dtype=int32), array([], dtype=float64))

Josef


>
> Cheers.
>
> Ernest
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list