[Numpy-discussion] large float32 array issue
braingateway
braingateway at gmail.com
Wed Nov 3 14:38:35 EDT 2010
Vincent Schut :
> Hi, I'm running in this strange issue when using some pretty large
> float32 arrays. In the following code I create a large array filled with
> ones, and calculate mean and sum, first with a float64 version, then
> with a float32 version. Note the difference between the two. NB the
> float64 version is obviously right :-)
>
>
>
> In [2]: areaGrid = numpy.ones((11334, 16002))
> In [3]: print(areaGrid.dtype)
> float64
> In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(),
> areaGrid.mean(), areaGrid.sum())
> ((11334, 16002), 1.0, 1.0, 1.0, 181366668.0)
>
>
> In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32)
> In [6]: print(areaGrid.dtype)
> float32
> In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(),
> areaGrid.mean(), areaGrid.sum())
> ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0)
>
>
Yes I also got the same problem.
b=npy.ones((11334,16002),dtype='float32')
>>> a.shape[0]*a.shape[1]
181366668L
>>> b.sum()
16777216.0
>>> print npy.finfo(b.dtype).max
3.40282e+38
Acumulator size is definitely not the problem.
I think the float point accuracy actually kicked in.
try following code:
npy.float32(16777216)+npy.float32(1)
You will see the number will not grow any more
it is because eps(npy.float32(16777216)) = 2 >1
That is why u cannot accumulate with 1 or smaller number beyound this
value.
try:
npy.float32(16777215)+npy.float32(0.5)
and:
npy.float64(1e16)+npy.float64(1)
You also cannot get bigger number by accumulation anymore
The numpy.sum() is simply clumsy in this aspect. It try to simply
accumulate all the value together, which should always be avoided for
float point value, even with float64 number. Think about add 1e12 with
1e16 values smaller than 0.0001, it will give u 1.0e12, instead of
2e12. Some one try to do smarter things like:
1) put all small value into a group, all big value into another group
2) obtain sum values respectively
3) add the sum values together
But it is costy I guess
> Can anybody confirm this? And better: explain it? Am I running into a
> for me till now hidden ieee float 'feature'? Or is it a bug somewhere?
>
> Btw I'd like to use float32 arrays, as precision is not really an issue
> in this case, but memory usage is...
>
>
> This is using python 2.7, numpy from git (yesterday's checkout), on arch
> linux 64bit.
>
> Best,
> Vincent.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list