[Numpy-discussion] ticket #605

David Huard david.huard at gmail.com
Wed Apr 9 10:01:27 EDT 2008


Hello Jarrod and co.,

here is my personal version of the histogram saga.

The current version of histogram puts in the rightmost bin all values larger
than range, but does not put in the leftmost bin all values smaller than
bin, eg.

In [6]: histogram([1,2,3,4,5,6], bins=3, range=[2,5])
Out[6]: (array([1, 1, 3]), array([ 2.,  3.,  4.]))

It discards 1, but puts 2 in the first bin, 3 in the second bin, and 4,5,6
in the third bin.  Also, the docstring  says that outliers are put in the
closest bin, which is false. Another point to consider is normalization.
Currently, the normalization factor is db=bin[1]-bin[0]. Of course, if the
bins are not equally spaced, this will yield a spurious density. Also, I'd
argue that since the rightmost bin covers the space from bin[-1] to
infinity, it's density should always be zero.

Now if someone wants to explain all that in the docstring, that's fine by
me. I fully understand the need to avoid breaking people's code. I simply
hope that in the next big release, this behavior can be changed to something
that is simpler: bins are the bin edges (instead of the left edges), and
everything outside the edges is ignored. This would be a nice occasion to
add an axis keyword and possibly weights, and would make histogram
consistent with histogramdd. I'm willing to implement those changes, but I
don't know how to do so without breaking histogram's behavior.

I just got Bruce reply, so sorry for the overlap.

David

2008/4/9, Jarrod Millman <millman at berkeley.edu>:
>
> Hello,
>
> I just turned this one into a blocker for now.  There has been a very
> long and good discussion about this ticket:
> http://projects.scipy.org/scipy/numpy/ticket/605
>
> Could someone (David?, Bruce?) briefly summarize the problem and the
> current proposed solution for us again?  Let's agree on the problem
> and the solution.  I want to have something similiar to what is
> written about median for this release:
> http://projects.scipy.org/scipy/numpy/milestone/1.0.5
>
> I agree with David's sentiment:  "This issue has been raised a number
> of times since I follow this ML. It's not the first time I've proposed
> patches, and I've already documented the weird behavior only to see
> the comments disappear after a while. I hope this time some kind of
> agreement will be reached."
>
> If you give me the short summary I will make sure Travis or Eric
> respond (and I will put it in the release notes).
>
> Thanks,
>
>
> --
> Jarrod Millman
> Computational Infrastructure for Research Labs
> 10 Giannini Hall, UC Berkeley
> phone: 510.643.4014
> http://cirl.berkeley.edu/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080409/f04b2947/attachment.html>


More information about the NumPy-Discussion mailing list