[Numpy-discussion] Automatic number of bins for numpy histograms
antony.lee at berkeley.edu
Tue Apr 14 17:02:05 EDT 2015
Another improvement would be to make sure, for integer-valued datasets,
that all bins cover the same number of integer, as it is easy to end up
otherwise with bins "effectively" wider than others:
shows a peak in the last bin, as it covers both 9 and 10.
2015-04-13 5:02 GMT-07:00 Neil Girdhar <mistersheik at gmail.com>:
> Can I suggest that we instead add the P-square algorithm for the dynamic
> calculation of histograms? (
> This is already implemented in C++'s boost library (
> I implemented it in Boost Python as a module, which I'm happy to share.
> This is much better than fixed-width histograms in practice. Rather than
> adjusting the number of bins, it adjusts what you really want, which is the
> resolution of the bins throughout the domain.
> On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers <ralf.gommers at gmail.com>
>> On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río <
>> jaime.frio at gmail.com> wrote:
>>> On Sun, Apr 12, 2015 at 12:19 AM, Varun <nayyarv at gmail.com> wrote:
>>>> Long story short, histogram visualisations that depend on numpy (such as
>>>> matplotlib, or nearly all of them) have poor default behaviour as I
>>>> have to
>>>> constantly play around with the number of bins to get a good idea of
>>>> what I'm
>>>> looking at. The bins=10 works ok for up to 1000 points or very normal
>>>> but has poor performance for anything else, and doesn't account for
>>>> variability either. I don't have a method easily available to scale the
>>>> of bins given the data.
>>>> R doesn't suffer from these problems and provides methods for use with
>>>> hist method. I would like to provide similar functionality for
>>>> matplotlib, to
>>>> at least provide some kind of good starting point, as histograms are
>>>> useful for initial data discovery.
>>>> The notebook above provides an explanation of the problem as well as
>>>> proposed alternatives. Use different datasets (type and size) to see
>>>> performance of the suggestions. All of the methods proposed exist in R
>>>> I've put together an implementation to add this new functionality, but
>>>> hesitant to make a pull request as I would like some feedback from a
>>>> maintainer before doing so.
>>> +1 on the PR.
>> +1 as well.
>> Unfortunately we can't change the default of 10, but a number of string
>> methods, with a "bins=auto" or some such name prominently recommended in
>> the docstring, would be very good to have.
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion