[Numpy-discussion] Precision difference between dot and sum

Bruce Southey bsouthey at gmail.com
Tue Nov 2 10:05:47 EDT 2010


  On 11/01/2010 09:39 PM, Joon wrote:
> Thanks for the replies.
>
> I tried several stuff like changing dot into sum in the gradient 
> calculations just to see how they change the results, but it seems 
> that part of the code is the only place where the results get affected 
> by the choice of dot/sum.
>
> I am using 64bit machine and EPD python (I think it uses Intel MKL) so 
> that could have affected the calculation. I will use dot whenever 
> possible from now on. :)
>
> -Joon
>
>
>
> On Mon, 01 Nov 2010 21:27:07 -0500, David <david at silveregg.co.jp> wrote:
>
> > On 11/02/2010 08:30 AM, Joon wrote:
> >> Hi,
> >>
> >> I just found that using dot instead of sum in numpy gives me better
> >> results in terms of precision loss. For example, I optimized a function
> >> with scipy.optimize.fmin_bfgs. For the return value for the function, I
> >> tried the following two things:
> >>
> >> sum(Xb) - sum(denominator)
> >>
> >> and
> >>
> >> dot(ones(Xb.shape), Xb) - dot(ones(denominator.shape), denominator)
> >>
> >> Both of them are supposed to yield the same thing. But the first one
> >> gave me -589112.30492110562 and the second one gave me
> >> -589112.30492110678.
> >
> > Those are basically the same number: the minimal spacing between two
> > double floats at this amplitude is ~ 1e-10 (given by the function
> > np.spacing(the_number)), which is essentially the order of magnitude of
> > the difference between your two numbers.
> >
> >> I was wondering if this is well-known fact and I'm supposed to use dot
> >> instead of sum whenever possible.
> >
> > You should use dot instead of sum when application, but for speed
> > reasons, essentially.
> >
> >>
> >> It would be great if someone could let me know why this happens.
> >
> > They don't use the same implementation, so such tiny differences are
> > expected - having exactly the same solution would have been surprising,
> > actually. You may be surprised about the difference for such a trivial
> > operation, but keep in mind that dot is implemented with highly
> > optimized CPU instructions (that is if you use ATLAS or similar 
> library).
> >
> > cheers,
> >
> > David
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> -- 
> Using Opera's revolutionary email client: http://www.opera.com/mail/
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Well, a 64-bit machine is somewhat irrelevant if you are running 32-bit 
application or the OS is limiting...

The overall difference is the combination of numerical 'errors' of two 
summations and a subtraction. Obviously depending on the numerical 
accuracy of the algorithm,  so, while the individual operations can be 
expected to be within numerical 'error', the overall result is not. You 
need to be comparing the two separate summations: sum(Xb) vs 
dot(ones(Xb.shape), Xb) and sum(denominator) vs 
dot(ones(denominator.shape), denominator). If these individual 
summations differ above numerical limits then you should look at 
numerically sound approaches. The simplest ways are to use the dtype 
argument in sum function or do your computations in a higher precision 
if your OS and Python versions allow it. Otherwise you should look for a 
numerical sound way to do the summation.

Somewhat related is that the large value of the result suggest that you 
should be standardizing your original data especially if the sum(Xb) is 
large. This should help with convergence and numerical precision. A 
'large' value for sum(Xb) can also indicate a poor fit so you should 
also check convergence, estimates and overall fit (especially outliers).

Bruce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101102/da7e77a4/attachment.html>


More information about the NumPy-Discussion mailing list