[Numpy-discussion] help creating a reversed cumulative histogram

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Sep 3 13:48:15 EDT 2009


On Thu, Sep 3, 2009 at 12:58 PM, Tim
Michelsen<timmichelsen at gmx-topmail.de> wrote:
>> My first stop is usually wikipedia:
> [...]
> Thanks.
> So I I'known that I have to call the beast a
> "empirical inverse survival function", Robert would
> also have foundit easier to help.
> Anyway, step by step...
>
>> In the case of the weight of pigs, it would be to cumulative weight of
>> all pigs with a weight less than the given bin boundary weight.
>> If values were income, then it would be the aggregated income of all
>> individual with an income below the bin bin boundary.
>> So it makes sense, given this is what you want (below).
> Exactly!
>
> Or for precipitation:
> a) count: number of precipitation events that
>    ocurred up to a certain limit
> b) sum: precipitation total registered up to that limit
>
>> there might be a mistake in the treatment of a cell when
>> reversing, when I run your example the highest value is
>> not equal to values.sum()
> This has made me think again. Small point.
>
> See here:
> ecdf_sums = np.hstack([0.0, sums[0].cumsum() ])
> ecdf_sums = np.hstack([sums[0].cumsum() ])
>
> I had to adjust the classes in the spreadsheet by
> replacing the first class limit by 0.0.
> I had modifed this yesterday to a different value
> (0.265152) as I was testing the code.
>
> from:
> 0.265152, 0.487273, 0.709394, 0.931515,
> 1.153636, 1.375758, 1.597879, 1.820000,
> 2.042121, 2.264242, 2.486364
>
> to:
> 0.0, 0.487273, 0.709394, 0.931515,
> 1.153636, 1.375758, 1.597879, 1.820000,
> 2.042121, 2.264242, 2.486364
>
> Now everything is fine. Results and curves match.
>
>> But I'm not sure yet, what's going on.
> 1) first I didn't know how to develop the code for a
>    "empirical inverse survival function" in numpy
> 2) I screwed my spreadsheet classes up while
>    testing and verifying my numpy code.
>
> Again, would a function for the
> "empirical inverse survival function" qualify for the
> inclusion into numpy or scipy?

Sorry, I'm too distracted, correcting myself a second time
 "this should *not* have inverse in it, using inverse was a cut and paste error"
it's  empirical survival function

If it's just a one-liner with cumsum, then I don't think its necessary
to have a function for it.

But following also the previous discussion, it would be useful to have
the combination of histogram and empirical cdf, sf, and/or pdf to
define an empirical distribution. As interpretation in terms of
distribution, normed=True would be necessary, but it could also be an
option.

One question to your application, in the plot you draw lines and not
histograms. Is there a reason to use histograms in the calculation
instead of the full ecdf. (i.e. cumsum on original values instead of
cumsum on histogrammed values) ?

Josef


>
> Thanks for the help.
>
> Best regards,
> Timmie
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list