Re: [Numpy-discussion] Is there a pure numpy recipe for this?

March 26, 2014


      On Wed, Mar 26, 2014 at 4:28 PM, Slaunger <Slaunger@gmail.com> wrote:
...
jseabold wrote
...
IIUC,
[~/]
[1]: np.logical_and([True, False, True], [False, False, True])
[1]: array([False, False,  True], dtype=bool)
You can avoid looping over k since they're all the same length
[~/]
[3]: np.logical_and([[True, False],[False, True],[False, True]],
[[False, False], [False, True], [True, True]])
[3]:
array([[False, False],
       [False,  True],
       [False,  True]], dtype=bool)
[~/]
[4]: np.sum(np.logical_and([[True, False],[False, True],[False,
True]], [[False, False], [False, True], [True, True]]), axis=0)
[4]: array([0, 2])
Well, yes, if you work with the pure f_k and g_k that is true, but this
two-dimensional array will have 4*10^14 elements and will exhaust my memory.
That is why I have found a more efficient method for finding only the much
fewer changes_at elements for each k, and these arrays have unequal length,
and has to be considered for eack k (which is tolerable as long as I avoid a
further inner loop for each k in explicit Python).
I could implement this in C and get it done sufficiently efficient. I just
like to make a point in demonstrating this is also doable in finite time in
Python/numpy.
If you want to attack it straight on and keep it conceptually simple,
this looks like it would work. Fair warning, I've never done this and
have no idea if it's actually memory and computationally efficient, so
I'd be interested to hear from experts. I just wanted to see if it
would work from disk. I wonder if a solution using PyTables would be
faster.

Provided that you can chunk your data into a memmap array, then
something you *could* do

N = 2*10**7
chunk_size = 100000

farr1 = 'scratch/arr1'
farr2 = 'scratch/arr2'

arr1 = np.memmap(farr1, dtype='uint8', mode='w+', shape=(N, 4))
arr2 = np.memmap(farr2, dtype='uint8', mode='w+', shape=(N, 4))

for i in xrange(0, N, chunk_size):
    arr1[i:i+chunk_size] = np.random.randint(2, size=(chunk_size,
4)).astype(np.uint8)
    arr2[i:i+chunk_size] = np.random.randint(2, size=(chunk_size,
4)).astype(np.uint8)

del arr1
del arr2

arr1 = np.memmap(farr1, mode='r', dtype='uint8', shape=(N,4))
arr2 = np.memmap(farr2, mode='r', dtype='uint8', shape=(N,4))


equal = np.logical_and(arr1[:chunk_size],
                       arr2[:chunk_size]).sum(0)

for i in xrange(chunk_size, N, chunk_size):
    equal += np.logical_and(arr1[i:i+chunk_size],
                            arr2[i:i+chunk_size]).sum(0)

Skipper

Re: [Numpy-discussion] Is there a pure numpy recipe for this?

Skipper Seabold