[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

Bruce Southey bsouthey at gmail.com
Mon Aug 30 11:39:51 EDT 2010


  On 08/30/2010 09:19 AM, Benjamin Root wrote:
> On Mon, Aug 30, 2010 at 8:29 AM, David Huard <david.huard at gmail.com 
> <mailto:david.huard at gmail.com>> wrote:
>
>     Thanks for the feedback,
>
>     As far as I understand it, the proposition is to keep histogram as
>     it is for 1.5, then in 2.0, deprecate normed=True but keep the
>     buggy behavior, while adding a density keyword that fixes the bug.
>     In a later release, we could then get rid of normed. While the bug
>     won't be present in histogramdd and histogram2d, the keyword
>     change should be mirrored in those functions as well.
>
>     I personally am not too keen on changing the keyword normed for
>     density. I feel we are trading clarity for a few new users against
>     additional trouble for many existing users. We could mitigate this
>     by first documenting the change in the docstring and live with
>     both keywords for a few years before raising a DeprecationWarning.
>
>     Since this has a direct impact on matloblib's hist, I'd be keen to
>     hears the devs on this.
>
>     David
>
>
> I am not a dev, but I would like to give a word of warning from 
> matplotlib.
>
> In matplotlib, the bar/hist family of functions grew organically as 
> the devs took on various requests to add keywords and such to modify 
> the style and behavior of those graphing functions.  It has now become 
> an unmaintainable mess, prompting discussions on how to rip it out and 
> replace it with a cleaner implementation.  While everyone agrees that 
> it needs to be done, we all don't want to break backwards compatibility.
>
> My personal feeling is that a function should do one thing, and do 
> that one thing well.  So, to me, that means that histogram() should 
> return an array of counts and the bins for those counts.  Anything 
> more is merely window dressing to me.  With this information, one can 
> easily compute a cumulative distribution function, and/or normalize 
> the result.  The idea is that if there is nothing special that needs 
> to be done within the histogram algorithm to accommodate these extra 
> features, then they belong outside the function.
>
> My 2 cents,
> Ben Root
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
+1 for Ben's approach.
This is very similar to my view regarding to the contingency table class 
proposed for scipy ( http://projects.scipy.org/scipy/ticket/1258) 
<http://projects.scipy.org/scipy/ticket/1258>. We need to provide the 
core functionality that other approaches such as density estimation can 
use but not be limited to specific details.

Bruce



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100830/dee7f5ba/attachment.html>


More information about the NumPy-Discussion mailing list