[Numpy-discussion] Automatic number of bins for numpy histograms

Nathaniel Smith njs at pobox.com
Tue Apr 14 19:12:15 EDT 2015


On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar <mistersheik at gmail.com> wrote:
> Can I suggest that we instead add the P-square algorithm for the dynamic
> calculation of histograms?
> (http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf)
>
> This is already implemented in C++'s boost library
> (http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp)
>
> I implemented it in Boost Python as a module, which I'm happy to share.
> This is much better than fixed-width histograms in practice.  Rather than
> adjusting the number of bins, it adjusts what you really want, which is the
> resolution of the bins throughout the domain.

This definitely sounds like a useful thing to have in numpy or scipy
(though if it's possible to do without using Boost/C++ that would be
nice). But yeah, we should leave the existing histogram alone (in this
regard) and add a new name for this like "adaptive_histogram" or
something. Then you can set about convincing matplotlib and friends to
use it by default :-)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org



More information about the NumPy-Discussion mailing list