[Numpy-discussion] Automatic number of bins for numpy histograms

Jaime Fernández del Río jaime.frio at gmail.com
Wed Apr 15 10:02:57 EDT 2015


On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar <mistersheik at gmail.com> wrote:

> Yeah, I'm not arguing, I'm just curious about your reasoning.  That
> explains why not C++.  Why would you want to do this in C and not Python?
>

Well, the algorithm has to iterate over all the inputs, updating the
estimated percentile positions at every iteration. Because the estimated
percentiles may change in every iteration, I don't think there is an easy
way of vectorizing the calculation with numpy. So I think it would be very
slow if done in Python.

Looking at this in some more details, how is this typically used? Because
it gives you approximate values that should split your sample into
similarly filled bins, but because the values are approximate, to compute a
proper histogram you would still need to do the binning to get the exact
results, right? Even with this drawback P-2 does have an algorithmic
advantage, so for huge inputs and many bins it should come ahead. But for
many medium sized problems it may be faster to simply use np.partition,
which gives you the whole thing in a single go. And it would be much
simpler to implement.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150415/667d8a47/attachment.html>


More information about the NumPy-Discussion mailing list