Mailman 3 Computing Simple Statistics When Only they Frequency Distribution is Known - NumPy-Discussion

newer
Packaging numpy with py2app v0.4.3

Computing Simple Statistics When Only they Frequency Distribution is Known

older
Producing a Histogram When Bins...

Wayne Watson

27 Nov 2009 27 Nov '09

9:47 p.m.

How do I compute avg, std dev, min, max and other simple stats if I only know the frequency distribution? -- Wayne Watson (Watson Adventures, Prop., Nevada City, CA) (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet 350 350 350 350 350 350 350 350 350 350 Make the number famous. See 350.org The major event has passed, but keep the number alive. Web Page: <www.speckledwithstars.net/>

Show replies by date

josef.pktd＠gmail.com

27 Nov 27 Nov

10:14 p.m.

New subject: Computing Simple Statistics When Only they Frequency Distribution is Known

On Fri, Nov 27, 2009 at 9:47 PM, Wayne Watson <sierra_mtnview@sbcglobal.net> wrote:

...

How do I compute avg, std dev, min, max and other simple stats if I only know the frequency distribution?

If you are willing to assign to all observations in a bin the value at the bin midpoint, then you could do it with weights in the statistics calculations. However, numpy.average is, I think, the only statistic that takes weights. min max are independent of weight, but std and var need to be calculated indirectly. If you need more stats with weights, then the attachment in http://projects.scipy.org/scipy/ticket/604 is a good start. Josef

...

-- Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet

350 350 350 350 350 350 350 350 350 350 Make the number famous. See 350.org The major event has passed, but keep the number alive.

Web Page: <www.speckledwithstars.net/>

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Wayne Watson

28 Nov 28 Nov

12:25 a.m.

I actually wrote my own several days ago. When I began getting myself more familiar with numpy, I was hoping there would be an easy to use version in it for this frequency approach. If not, then I'll just stick with what I have. It seems something like this should be common. A simple way to do it with the present capabilities would be to "unwind" the frequencies, For example, given [2,1,3] for some corresponding set of x, say, [1,2,3], produce[1, 1, 2, 3, 3, 3]. I have no idea if numpy does anything like that, but, if so, the typical mean, std, ... could be used. In my case, it's sort of pointless. It would produce an array of 307,200 items for 256 x (0,1,2,...,255), and just slow down the computations "unwinding" it in software. The sub-processor hardware already produced the 256 frequencies. Basically, this amounts to having a pdf, and values of x. Mathematically, the statistics are produced directly from it. josef.pktd@gmail.com wrote:

...

On Fri, Nov 27, 2009 at 9:47 PM, Wayne Watson <sierra_mtnview@sbcglobal.net> wrote:

...
How do I compute avg, std dev, min, max and other simple stats if I only know the frequency distribution?

If you are willing to assign to all observations in a bin the value at the bin midpoint, then you could do it with weights in the statistics calculations. However, numpy.average is, I think, the only statistic that takes weights. min max are independent of weight, but std and var need to be calculated indirectly.

If you need more stats with weights, then the attachment in http://projects.scipy.org/scipy/ticket/604 is a good start.

Josef

...
-- Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet

350 350 350 350 350 350 350 350 350 350 Make the number famous. See 350.org The major event has passed, but keep the number alive.

Web Page: <www.speckledwithstars.net/>

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Wayne Watson (Watson Adventures, Prop., Nevada City, CA) (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time) Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet 350 350 350 350 350 350 350 350 350 350 Make the number famous. See 350.org The major event has passed, but keep the number alive. Web Page: <www.speckledwithstars.net/>

David Goldsmith

10:20 p.m.

New subject: Computing Simple Statistics When Only they Frequency Distribution is Known

On Fri, Nov 27, 2009 at 9:25 PM, Wayne Watson <sierra_mtnview@sbcglobal.net>wrote:

...

I actually wrote my own several days ago. When I began getting myself more familiar with numpy, I was hoping there would be an easy to use version in it for this frequency approach. If not, then I'll just stick with what I have. It seems something like this should be common.

A simple way to do it with the present capabilities would be to "unwind" the frequencies, For example, given [2,1,3] for some corresponding set of x, say, [1,2,3], produce[1, 1, 2, 3, 3, 3]. I have no idea if numpy does anything like that, but, if so, the typical mean, std, ... could be used. In my case, it's sort of pointless. It would produce an array of 307,200 items for 256 x (0,1,2,...,255), and just slow down the computations "unwinding" it in software. The sub-processor hardware already produced the 256 frequencies.

Basically, this amounts to having a pdf, and values of x. Mathematically, the statistics are produced directly from it.

josef.pktd@gmail.com wrote:

...
On Fri, Nov 27, 2009 at 9:47 PM, Wayne Watson <sierra_mtnview@sbcglobal.net> wrote:

...
How do I compute avg, std dev, min, max and other simple stats if I only know the frequency distribution?

If you are willing to assign to all observations in a bin the value at the bin midpoint, then you could do it with weights in the statistics calculations. However, numpy.average is, I think, the only statistic that takes weights. min max are independent of weight, but std and var need to be calculated indirectly.

If you need more stats with weights, then the attachment in http://projects.scipy.org/scipy/ticket/604 is a good start.

Josef

Wayne: There is no need to "unwind": If Y(X) is the (unnormalized) freq. distr. of random variable/data X, start by computing y = Y/(Y.sum()) (if Y is already normalized, skip this step). Then: av(X) = np.dot(X, y), sd(X) = np.sqrt(np.dot((X*X), y) - (av(X))^2), and higher moment statistics can be calculated utilizing similar formulae. DG

Wayne Watson

10:45 p.m.

David Goldsmith wrote:

...

On Fri, Nov 27, 2009 at 9:25 PM, Wayne Watson <sierra_mtnview@sbcglobal.net <mailto:sierra_mtnview@sbcglobal.net>> wrote:

I actually wrote my own several days ago. When I began getting myself more familiar with numpy, I was hoping there would be an easy to use version in it for this frequency approach. If not, then I'll just stick with what I have. It seems something like this should be common.

...

> If you need more stats with weights, then the attachment in > http://projects.scipy.org/scipy/ticket/604 is a good start. > > Josef

Wayne:

There is no need to "unwind": If Y(X) is the (unnormalized) freq. distr. of random variable/data X, start by computing y = Y/(Y.sum()) (if Y is already normalized, skip this step). Then:

av(X) = np.dot(X, y), sd(X) = np.sqrt(np.dot((X*X), y) - (av(X))^2), and higher moment statistics can be calculated utilizing similar formulae.

DG I was only illustrating a way that I would not consider, since the hardware has already created the pdf. I've already coded it pretty much as you have suggested. As I think I mention ed above, I'm a bit surprised numpy doesn't provide the code you suggest as part of some function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true, ...).

Anne Archibald

10:50 p.m.

New subject: Computing Simple Statistics When Only they Frequency Distribution is Known

2009/11/28 Wayne Watson <sierra_mtnview@sbcglobal.net>:

...

I was only illustrating a way that I would not consider, since the hardware has already created the pdf. I've already coded it pretty much as you have suggested. As I think I mention ed above, I'm a bit surprised numpy doesn't provide the code you suggest as part of some function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true, ...).

Feel free to submit an implementation to numpy's issue tracker. I suggest modifying mean, std, and var (at least) so that, like average, they take an array of weights. Anne

Wayne Watson

11:58 p.m.

How would I do that? Anne Archibald wrote:

...

2009/11/28 Wayne Watson <sierra_mtnview@sbcglobal.net>:

...
I was only illustrating a way that I would not consider, since the hardware has already created the pdf. I've already coded it pretty much as you have suggested. As I think I mention ed above, I'm a bit surprised numpy doesn't provide the code you suggest as part of some function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true, ...).

Feel free to submit an implementation to numpy's issue tracker. I suggest modifying mean, std, and var (at least) so that, like average, they take an array of weights.

Anne _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Anne Archibald

29 Nov 29 Nov

8:15 p.m.

New subject: Computing Simple Statistics When Only they Frequency Distribution is Known

2009/11/28 Wayne Watson <sierra_mtnview@sbcglobal.net>:

...

Anne Archibald wrote:

...
2009/11/28 Wayne Watson <sierra_mtnview@sbcglobal.net>:

...
I was only illustrating a way that I would not consider, since the hardware has already created the pdf. I've already coded it pretty much as you have suggested. As I think I mention ed above, I'm a bit surprised numpy doesn't provide the code you suggest as part of some function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true, ...).

Feel free to submit an implementation to numpy's issue tracker. I suggest modifying mean, std, and var (at least) so that, like average, they take an array of weights.

How would I do that?

Obtain a copy of recent numpy source code - a .tar file from the website, or using SVN, or git. Then add the feature plus some tests, confirm that the tests pass, and post a request and your patch to the bug tracker. Anne

Charles R Harris

8:33 p.m.

New subject: Computing Simple Statistics When Only they Frequency Distribution is Known

On Sun, Nov 29, 2009 at 6:15 PM, Anne Archibald <peridot.faceted@gmail.com>wrote:

...

2009/11/28 Wayne Watson <sierra_mtnview@sbcglobal.net>:

...
Anne Archibald wrote:

...
2009/11/28 Wayne Watson <sierra_mtnview@sbcglobal.net>:

...
I was only illustrating a way that I would not consider, since the hardware has already created the pdf. I've already coded it pretty much as you have suggested. As I think I mention ed above, I'm a bit surprised numpy doesn't provide the code you suggest as part of some function. CalcSimplefromPDF(xvalues=mydatarray, avg=ture, minmax=true, ...).

Feel free to submit an implementation to numpy's issue tracker. I suggest modifying mean, std, and var (at least) so that, like average, they take an array of weights.

How would I do that?

Obtain a copy of recent numpy source code - a .tar file from the website, or using SVN, or git. Then add the feature plus some tests, confirm that the tests pass, and post a request and your patch to the bug tracker.

You might also want to use average as a starting point. Chuck

5512

Age (days ago)

5514

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

Anne Archibald
Charles R Harris
David Goldsmith
josef.pktd＠gmail.com
Wayne Watson

Computing Simple Statistics When Only they Frequency Distribution is Known

Wayne Watson

josef.pktd＠gmail.com

Wayne Watson

David Goldsmith

Wayne Watson

Anne Archibald

Wayne Watson

Anne Archibald

Charles R Harris

tags

participants (5)