[Numpy-discussion] Automatic number of bins for numpy histograms

Jaime Fernández del Río jaime.frio at gmail.com
Tue Apr 14 19:24:55 EDT 2015


On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
> > Can I suggest that we instead add the P-square algorithm for the dynamic
> > calculation of histograms?
> > (
> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
> )
> >
> > This is already implemented in C++'s boost library
> > (
> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
> )
> >
> > I implemented it in Boost Python as a module, which I'm happy to share.
> > This is much better than fixed-width histograms in practice.  Rather than
> > adjusting the number of bins, it adjusts what you really want, which is
> the
> > resolution of the bins throughout the domain.
>
> This definitely sounds like a useful thing to have in numpy or scipy
> (though if it's possible to do without using Boost/C++ that would be
> nice). But yeah, we should leave the existing histogram alone (in this
> regard) and add a new name for this like "adaptive_histogram" or
> something. Then you can set about convincing matplotlib and friends to
> use it by default :-)
>

Would having a negative number of bins mean "this many, but with optimized
boundaries" be too clever an interface?

I have taken a look at the paper linked, and the P-2 algorithm would not be
too complicated to implement from scratch, although it would require
writing some C code I'm afraid.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150414/9b9157e4/attachment.html>


More information about the NumPy-Discussion mailing list