[Numpy-discussion] Automatic number of bins for numpy histograms

Ralf Gommers ralf.gommers at gmail.com
Sun Apr 12 04:02:36 EDT 2015

On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río <
jaime.frio at gmail.com> wrote:

> On Sun, Apr 12, 2015 at 12:19 AM, Varun <nayyarv at gmail.com> wrote:
>> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
>> tistics/A
>> <http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A>
>> utomating%20Binwidth%20Choice%20for%20Histogram.ipynb
>> Long story short, histogram visualisations that depend on numpy (such as
>> matplotlib, or  nearly all of them) have poor default behaviour as I have
>> to
>> constantly play around with  the number of bins to get a good idea of
>> what I'm
>> looking at. The bins=10 works ok for  up to 1000 points or very normal
>> data,
>> but has poor performance for anything else, and  doesn't account for
>> variability either. I don't have a method easily available to scale the
>> number
>> of bins given the data.
>> R doesn't suffer from these problems and provides methods for use with
>> it's
>> hist  method. I would like to provide similar functionality for
>> matplotlib, to
>> at least provide  some kind of good starting point, as histograms are very
>> useful for initial data discovery.
>> The notebook above provides an explanation of the problem as well as some
>> proposed  alternatives. Use different datasets (type and size) to see the
>> performance of the  suggestions. All of the methods proposed exist in R
>> and
>> literature.
>> I've put together an implementation to add this new functionality, but am
>> hesitant to  make a pull request as I would like some feedback from a
>> maintainer before doing so.
> +1 on the PR.

+1 as well.

Unfortunately we can't change the default of 10, but a number of string
methods, with a "bins=auto" or some such name prominently recommended in
the docstring, would be very good to have.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150412/5d5f38bc/attachment.html>

More information about the NumPy-Discussion mailing list