[Numpy-discussion] Strange numpy behaviour (bug?)
Sturla Molden
sturla at molden.no
Wed Jan 18 00:57:38 EST 2012
Never mind this, it was my own mistake as I expected :-)
def __chunk(n,size):
x = range(0,n,size)
x.append(n)
return zip(x[:-1],x[1:])
makes it a lot better :)
Sturla
Den 18.01.2012 06:26, skrev Sturla Molden:
> While "playing" with a point-in-polygon test, I have discovered some a
> failure mode that I cannot make sence of.
>
> The algorithm is vectorized for NumPy from a C and Python implementation
> I found on the net (see links below). It is written to process a large
> dataset in chunks. I'm rather happy with it, it can test 100,000 x,y
> points against a non-convex pentagon in just 50 ms.
>
> Anyway, here is something very strange (or at least I think so):
>
> If I use a small chunk size, it sometimes fails. I know I shouldn't
> blame it on NumPy, beacuse it is by all likelood my mistake. But it does
> not make any sence, as the parameter should not affect the computation.
>
> Observed behavior:
>
> 1. Processing the whole dataset in one big chunk always works.
>
> 2. Processing the dataset in big chunks (e.g. 8192 points) always works.
>
> 3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail.
>
> 4. Processing the dataset element-wise always work.
>
> 5. The scalar version behaves like the numpy version: fine for large
> chunks, sometimes it fails for small. That is, when list comprehensions
> is used for chunks. Big list comprehensions always work, small ones
> might fail.
>
> It looks like the numerical robstness of the alorithm depends on a
> parameter that has nothing to do with the algorithm at all. For example
> in (5), we might think that calling a function from a nested loop makes
> it fail, depending on the length of the inner loop. But calling it from
> a single loop works just fine.
>
> ???
>
> So I wonder:
>
> Could there be a bug in numpy that only shows up only when taking a huge
> number of short slices?
>
> I don't know... But try it if you care.
>
> In the function "inpolygon", change the call that says __chunk(n,8192)
> to e.g. __chunk(n,32) to see it fail (or at least it does on my
> computer, running Enthought 7.2-1 on Win64).
>
>
> Regards,
> Sturla Molden
>
>
>
>
>
> def __inpolygon_scalar(x,y,poly):
>
> # Source code taken from:
> # http://paulbourke.net/geometry/insidepoly
> # http://www.ariel.com.au/a/python-point-int-poly.html
>
> n = len(poly)
> inside = False
> p1x,p1y = poly[0]
> xinters = 0
> for i in range(n+1):
> p2x,p2y = poly[i % n]
> if y> min(p1y,p2y):
> if y<= max(p1y,p2y):
> if x<= max(p1x,p2x):
> if p1y != p2y:
> xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
> if p1x == p2x or x<= xinters:
> inside = not inside
> p1x,p1y = p2x,p2y
> return inside
>
>
> # the rest is (C) Sturla Molden, 2012
> # University of Oslo
>
> def __inpolygon_numpy(x,y,poly):
> """ numpy vectorized version """
> n = len(poly)
> inside = np.zeros(x.shape[0], dtype=bool)
> xinters = np.zeros(x.shape[0], dtype=float)
> p1x,p1y = poly[0]
> for i in range(n+1):
> p2x,p2y = poly[i % n]
> mask = (y> min(p1y,p2y))& (y<= max(p1y,p2y))& (x<=
> max(p1x,p2x))
> if p1y != p2y:
> xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
> if p1x == p2x:
> inside[mask] = ~inside[mask]
> else:
> mask2 = x[mask]<= xinters[mask]
> idx, = np.where(mask)
> idx2, = np.where(mask2)
> idx = idx[idx2]
> inside[idx] = ~inside[idx]
> p1x,p1y = p2x,p2y
> return inside
>
> def __chunk(n,size):
> x = range(0,n,size)
> if (n%size):
> x.append(n)
> return zip(x[:-1],x[1:])
>
> def inpolygon(x, y, poly):
> """
> point-in-polygon test
> x and y are numpy arrays
> polygon is a list of (x,y) vertex tuples
> """
> if np.isscalar(x) and np.isscalar(y):
> return __inpolygon_scalar(x, y, poly)
> else:
> x = np.asarray(x)
> y = np.asarray(y)
> n = x.shape[0]
> z = np.zeros(n, dtype=bool)
> for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ???
> if j-i> 1:
> z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly)
> else:
> z[i] = __inpolygon_scalar(x[i], y[i], poly)
> return z
>
>
>
> if __name__ == "__main__":
>
> import matplotlib
> import matplotlib.pyplot as plt
> from time import clock
>
> n = 100000
> polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)]
> xp = [x for x,y in polygon]
> yp = [y for x,y in polygon]
> x = np.random.rand(n)
> y = np.random.rand(n)
> t0 = clock()
> inside = inpolygon(x,y,polygon)
> t1 = clock()
> print 'elapsed time %.3g ms' % ((t0-t1)*1E3,)
> plt.figure()
> plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g')
> plt.axis([0,1,0,1])
> plt.show()
>
>
>
>
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list