[Numpy-discussion] Strange numpy behaviour (bug?)

Wed Jan 18 00:26:10 EST 2012

While "playing" with a point-in-polygon test, I have discovered some a 
failure mode that I cannot make sence of.

The algorithm is vectorized for NumPy from a C and Python implementation 
I found on the net (see links below). It is written to process a large 
dataset in chunks. I'm rather happy with it, it can test 100,000 x,y 
points against a non-convex pentagon in just 50 ms.

Anyway, here is something very strange (or at least I think so):

If I use a small chunk size, it sometimes fails. I know I shouldn't 
blame it on NumPy, beacuse it is by all likelood my mistake. But it does 
not make any sence, as the parameter should not affect the computation.

Observed behavior:

1. Processing the whole dataset in one big chunk always works.

2. Processing the dataset in big chunks (e.g. 8192 points) always works.

3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail.

4. Processing the dataset element-wise always work.

5. The scalar version behaves like the numpy version: fine for large 
chunks, sometimes it fails for small. That is, when list comprehensions 
is used for chunks. Big list comprehensions always work, small ones 
might fail.

It looks like the numerical robstness of the alorithm depends on a 
parameter that has nothing to do with the algorithm at all. For example 
in (5), we might think that calling a function from a nested loop makes 
it fail, depending on the length of the inner loop. But calling it from 
a single loop works just fine.

???

So I wonder:

Could there be a bug in numpy that only shows up only when taking a huge 
number of short slices?

I don't know... But try it if you care.

In the function "inpolygon", change the call that says __chunk(n,8192) 
to e.g. __chunk(n,32) to see it fail (or at least it does on my 
computer, running Enthought 7.2-1 on Win64).

Regards,
Sturla Molden

def __inpolygon_scalar(x,y,poly):

     # Source code taken from:
     # http://paulbourke.net/geometry/insidepoly
     # http://www.ariel.com.au/a/python-point-int-poly.html

     n = len(poly)
     inside = False
     p1x,p1y = poly[0]
     xinters = 0
     for i in range(n+1):
         p2x,p2y = poly[i % n]
         if y > min(p1y,p2y):
             if y <= max(p1y,p2y):
                 if x <= max(p1x,p2x):
                     if p1y != p2y:
                         xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
                     if p1x == p2x or x <= xinters:
                         inside = not inside
         p1x,p1y = p2x,p2y
     return inside

# the rest is (C) Sturla Molden, 2012
# University of Oslo

def __inpolygon_numpy(x,y,poly):
     """ numpy vectorized version """
     n = len(poly)
     inside = np.zeros(x.shape[0], dtype=bool)
     xinters = np.zeros(x.shape[0], dtype=float)
     p1x,p1y = poly[0]
     for i in range(n+1):
         p2x,p2y = poly[i % n]
         mask = (y > min(p1y,p2y)) & (y <= max(p1y,p2y)) & (x <= 
max(p1x,p2x))
         if p1y != p2y:
             xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
         if p1x == p2x:
             inside[mask] = ~inside[mask]
         else:
             mask2 = x[mask] <= xinters[mask]
             idx, = np.where(mask)
             idx2, = np.where(mask2)
             idx = idx[idx2]
             inside[idx] = ~inside[idx]
         p1x,p1y = p2x,p2y
     return inside

def __chunk(n,size):
     x = range(0,n,size)
     if (n%size):
         x.append(n)
     return zip(x[:-1],x[1:])

def inpolygon(x, y, poly):
     """
     point-in-polygon test
     x and y are numpy arrays
     polygon is a list of (x,y) vertex tuples
     """
     if np.isscalar(x) and np.isscalar(y):
         return __inpolygon_scalar(x, y, poly)
     else:
         x = np.asarray(x)
         y = np.asarray(y)
         n = x.shape[0]
         z = np.zeros(n, dtype=bool)
         for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ???
             if j-i > 1:
                 z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly)
             else:
                 z[i] = __inpolygon_scalar(x[i], y[i], poly)
         return z

if __name__ == "__main__":

     import matplotlib
     import matplotlib.pyplot as plt
     from time import clock

     n = 100000
     polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)]
     xp = [x for x,y in polygon]
     yp = [y for x,y in polygon]
     x = np.random.rand(n)
     y = np.random.rand(n)
     t0 = clock()
     inside = inpolygon(x,y,polygon)
     t1 = clock()
     print 'elapsed time %.3g ms' % ((t0-t1)*1E3,)
     plt.figure()
     plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g')
     plt.axis([0,1,0,1])
     plt.show()