[Numpy-discussion] summing over more than one axis

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Aug 19 17:20:03 EDT 2010


On Thu, Aug 19, 2010 at 4:03 PM, John Salvatier
<jsalvati at u.washington.edu> wrote:
> Precise in what sense? Numerical accuracy? If so, why is that?

I don't remember where I ran into this example, maybe integer
underflow (?) with addition.
NIST ANOVA test cases have some nasty badly scaled variables

but I have problems creating one, difference in 10th or higher digit

>>> a = 1000000*np.random.randn(10000,1000)
>>> a.sum()
-820034796.05545747
>>> np.sort(a.ravel())[::-1].sum()
-820034795.87886333
>>> np.sort(a.ravel()).sum()
-820034795.88172638
>>> np.sort(a,0)[::-1].sum()
-820034795.82333243
>>> np.sort(a,1)[::-1].sum()
-820034796.05559027
>>> a.sum(-1).sum(-1)
-820034796.05551744
>>> np.sort(a,1)[::-1].sum(-1).sum(-1)
-820034796.05543578
>>> np.sort(a,0)[::-1].sum(-1).sum(-1)
-820034796.05590343
>>> np.sort(a,1).sum(-1).sum(-1)
-820034796.05544424
>>> am = a.mean()
>>> am*a.size + np.sort(a-am,1).sum(-1).sum(-1)
-820034796.05554879
>>> a.size * np.sort(a,1).mean(-1).mean(-1)
-820034796.05544722

badly scaled or badly sorted arrays don't add up well

but I'm not able to get worse than 10th or 11th decimal in some random
generated examples with size 10000x1000

Josef



>
> On Thu, Aug 19, 2010 at 12:13 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Thu, Aug 19, 2010 at 11:29 AM, Joe Harrington <jh at physics.ucf.edu>
>> wrote:
>> > On Thu, 19 Aug 2010 09:06:32 -0500, G?khan Sever <gokhansever at gmail.com>
>> > wrote:
>> >
>> >>On Thu, Aug 19, 2010 at 9:01 AM, greg whittier <gregwh at gmail.com> wrote:
>> >>
>> >>> I frequently deal with 3D data and would like to sum (or find the
>> >>> mean, etc.) over the last two axes.  I.e. sum a[i,j,k] over j and k.
>> >>> I find using .sum() really convenient for 2d arrays but end up
>> >>> reshaping 2d arrays to do this.  I know there has to be a more
>> >>> convenient way.  Here's what I'm doing
>> >>>
>> >>> a = np.arange(27).reshape(3,3,3)
>> >>>
>> >>> # sum over axis 1 and 2
>> >>> result = a.reshape((a.shape[0], a.shape[1]*a.shape[2])).sum(axis=1)
>> >>>
>> >>> Is there a cleaner way to do this?  I'm sure I'm missing something
>> >>> obvious.
>> >>>
>> >>> Thanks,
>> >>> Greg
>> >>>
>> >>
>> >>Using two sums
>> >>
>> >>np.sum(np.sum(a, axis=-2), axis=1)
>> >
>> > Be careful.  This works for sums, but not for operations like median;
>> > the median of the row medians may not be the global median.  So, you
>> > need to do the medians in one step.  I'm not aware of a method cleaner
>> > than manually reshaping first.  There may also be speed reasons to do
>> > things in one step.  But, two steps may look cleaner in code.
>>
>> I think, two .sums() are the most accurate, if precision matters. One
>> big summation is often not very precise.
>>
>> Josef
>>
>>
>> >
>> > --jh--
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



More information about the NumPy-Discussion mailing list