<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith <span dir="ltr"><<a href="mailto:njs@pobox.com" target="_blank">njs@pobox.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar <<a href="mailto:mistersheik@gmail.com">mistersheik@gmail.com</a>> wrote:<br>

</span><span class="">> Can I suggest that we instead add the P-square algorithm for the dynamic<br>

> calculation of histograms?<br>

> (<a href="http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf" target="_blank">http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf</a>)<br>

><br>

> This is already implemented in C++'s boost library<br>

> (<a href="http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp" target="_blank">http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp</a>)<br>

><br>

> I implemented it in Boost Python as a module, which I'm happy to share.<br>

> This is much better than fixed-width histograms in practice.  Rather than<br>

> adjusting the number of bins, it adjusts what you really want, which is the<br>

> resolution of the bins throughout the domain.<br>

<br>

</span>This definitely sounds like a useful thing to have in numpy or scipy<br>

(though if it's possible to do without using Boost/C++ that would be<br>

nice). But yeah, we should leave the existing histogram alone (in this<br>

regard) and add a new name for this like "adaptive_histogram" or<br>

something. Then you can set about convincing matplotlib and friends to<br>

use it by default :-)<br></blockquote><div><br></div><div>Would having a negative number of bins mean "this many, but with optimized boundaries" be too clever an interface?</div><div><br></div><div>I have taken a look at the paper linked, and the P-2 algorithm would not be too complicated to implement from scratch, although it would require writing some C code I'm afraid.</div><div><br></div><div>Jaime</div></div><div><br></div>-- <br><div class="gmail_signature">(\__/)<br>( O.o)<br>( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.</div>

</div></div>