<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">I don't agree. The problem is that I
expect `mean` to do something reasonable. The documentation
mentions that the results can be "inaccurate", which is a huge
understatement: the results can be utterly wrong. That is not
reasonable. At the very least, a warning should be issued in cases
where the dtype might not be appropriate. <br>
<br>
One cannot predict what input sizes a program will be run with
once it's in use (especially if it's in use for several years).
I'd argue this is true for pretty much every code except quick
one-off scripts. Thus one would have to use `dtype=np.float64`
everywhere. By which point it seems obvious that it should have
been the default in the first place. The other alternative would
be to extend np.mean with some logic that internally figures out
the right thing to do (which I don't think is too hard, since ).<br>
<br>
Your example with the short axis is something that can be checked
for. I agree that the logic could become a bit hairy, but not too
much: If we are going to sum up more than N values (where N could
be determined at compile time, or simply be some constant), we
upcast unless the user explicitly specified a dtype. Of course,
this would incur an increase in memory. However I'd argue that
it's not even a large increase: If you can fit the matrix in
memory, then allocating a row/column of float64 instead of float32
should be doable, as well. And I'd much rather get an OutOfMemory
execption than silently continue my calculations with
useless/wrong results.<br>
<br>
Cheers<br>
<br>
Thomas<br>
<br>
<br>
<br>
On 2014-07-24 11:59, Eelco Hoogendoorn wrote:<br>
</div>
<blockquote
cite="mid:CAO0rnfG34C0UdoAJkZT=cE4mR=cLuADkrASLXxsTekcbjyPhWQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>Arguably, this isn't a problem of numpy, but of programmers
being trained to think of floating point numbers as 'real'
numbers, rather than just a finite number of states with a
funny distribution over the number line. np.mean isn't broken;
your understanding of floating point number is.</div>
<div><br>
</div>
<div>What you appear to wish for is a silent upcasting of the
accumulated result. This is often performed in reducing
operations, but I can imagine it runs into trouble for
nd-arrays. After all, if I have a huge array that I want to
reduce over a very short axis, upcasting might be very
undesirable; it wouldn't buy me any extra precision, but it
would increase memory use from 'huge' to 'even more huge'.<br>
<br>
np.mean has a kwarg that allows you to explicitly choose the
dtype of the accumulant.
X.mean(dtype=np.float64)==1.0. Personally, I have a distaste
for implicit behavior, unless the rule is simple and there
really can be no negative downsides; which doesn't apply here
I would argue. Perhaps when reducing an array completely to a
single value, there is no harm in upcasting to the maximum
machine precision; but that becomes a rather complex rule
which would work out differently for different machines. Its
better to be confronted with the limitations of floating point
numbers earlier, rather than later when you want to distribute
your work and run into subtle bugs on other peoples
computers.</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
NumPy-Discussion mailing list
<a class="moz-txt-link-abbreviated" href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a>
<a class="moz-txt-link-freetext" href="http://mail.scipy.org/mailman/listinfo/numpy-discussion">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a>
</pre>
</blockquote>
<br>
</body>
</html>