[Numpy-discussion] Automatic number of bins for numpy histograms
ralf.gommers at gmail.com
Sun Apr 12 04:02:36 EDT 2015
On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río <
jaime.frio at gmail.com> wrote:
> On Sun, Apr 12, 2015 at 12:19 AM, Varun <nayyarv at gmail.com> wrote:
>> Long story short, histogram visualisations that depend on numpy (such as
>> matplotlib, or nearly all of them) have poor default behaviour as I have
>> constantly play around with the number of bins to get a good idea of
>> what I'm
>> looking at. The bins=10 works ok for up to 1000 points or very normal
>> but has poor performance for anything else, and doesn't account for
>> variability either. I don't have a method easily available to scale the
>> of bins given the data.
>> R doesn't suffer from these problems and provides methods for use with
>> hist method. I would like to provide similar functionality for
>> matplotlib, to
>> at least provide some kind of good starting point, as histograms are very
>> useful for initial data discovery.
>> The notebook above provides an explanation of the problem as well as some
>> proposed alternatives. Use different datasets (type and size) to see the
>> performance of the suggestions. All of the methods proposed exist in R
>> I've put together an implementation to add this new functionality, but am
>> hesitant to make a pull request as I would like some feedback from a
>> maintainer before doing so.
> +1 on the PR.
+1 as well.
Unfortunately we can't change the default of 10, but a number of string
methods, with a "bins=auto" or some such name prominently recommended in
the docstring, would be very good to have.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion