[Numpy-discussion] Automatic number of bins for numpy histograms
chris.barker at noaa.gov
Tue Apr 14 17:08:34 EDT 2015
On Mon, Apr 13, 2015 at 5:02 AM, Neil Girdhar <mistersheik at gmail.com> wrote:
> Can I suggest that we instead add the P-square algorithm for the dynamic
> calculation of histograms? (
This look slike a great thing to have in numpy. However, I suspect that a
lot of the downstream code that uses histogram expects equally-spaced bins.
So this should probably be a "in addition to", rather than an "instead of"
> This is already implemented in C++'s boost library (
> I implemented it in Boost Python as a module, which I'm happy to share.
> This is much better than fixed-width histograms in practice. Rather than
> adjusting the number of bins, it adjusts what you really want, which is the
> resolution of the bins throughout the domain.
> On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers <ralf.gommers at gmail.com>
>> On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río <
>> jaime.frio at gmail.com> wrote:
>>> On Sun, Apr 12, 2015 at 12:19 AM, Varun <nayyarv at gmail.com> wrote:
>>>> Long story short, histogram visualisations that depend on numpy (such as
>>>> matplotlib, or nearly all of them) have poor default behaviour as I
>>>> have to
>>>> constantly play around with the number of bins to get a good idea of
>>>> what I'm
>>>> looking at. The bins=10 works ok for up to 1000 points or very normal
>>>> but has poor performance for anything else, and doesn't account for
>>>> variability either. I don't have a method easily available to scale the
>>>> of bins given the data.
>>>> R doesn't suffer from these problems and provides methods for use with
>>>> hist method. I would like to provide similar functionality for
>>>> matplotlib, to
>>>> at least provide some kind of good starting point, as histograms are
>>>> useful for initial data discovery.
>>>> The notebook above provides an explanation of the problem as well as
>>>> proposed alternatives. Use different datasets (type and size) to see
>>>> performance of the suggestions. All of the methods proposed exist in R
>>>> I've put together an implementation to add this new functionality, but
>>>> hesitant to make a pull request as I would like some feedback from a
>>>> maintainer before doing so.
>>> +1 on the PR.
>> +1 as well.
>> Unfortunately we can't change the default of 10, but a number of string
>> methods, with a "bins=auto" or some such name prominently recommended in
>> the docstring, would be very good to have.
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion