
On Wed, Mar 26, 2014 at 4:28 PM, Slaunger <Slaunger@gmail.com> wrote:
jseabold wrote
IIUC,
[~/] [1]: np.logical_and([True, False, True], [False, False, True]) [1]: array([False, False, True], dtype=bool)
You can avoid looping over k since they're all the same length
[~/] [3]: np.logical_and([[True, False],[False, True],[False, True]], [[False, False], [False, True], [True, True]]) [3]: array([[False, False], [False, True], [False, True]], dtype=bool)
[~/] [4]: np.sum(np.logical_and([[True, False],[False, True],[False, True]], [[False, False], [False, True], [True, True]]), axis=0) [4]: array([0, 2])
Well, yes, if you work with the pure f_k and g_k that is true, but this two-dimensional array will have 4*10^14 elements and will exhaust my memory.
That is why I have found a more efficient method for finding only the much fewer changes_at elements for each k, and these arrays have unequal length, and has to be considered for eack k (which is tolerable as long as I avoid a further inner loop for each k in explicit Python).
I could implement this in C and get it done sufficiently efficient. I just like to make a point in demonstrating this is also doable in finite time in Python/numpy.
If you want to attack it straight on and keep it conceptually simple, this looks like it would work. Fair warning, I've never done this and have no idea if it's actually memory and computationally efficient, so I'd be interested to hear from experts. I just wanted to see if it would work from disk. I wonder if a solution using PyTables would be faster. Provided that you can chunk your data into a memmap array, then something you *could* do N = 2*10**7 chunk_size = 100000 farr1 = 'scratch/arr1' farr2 = 'scratch/arr2' arr1 = np.memmap(farr1, dtype='uint8', mode='w+', shape=(N, 4)) arr2 = np.memmap(farr2, dtype='uint8', mode='w+', shape=(N, 4)) for i in xrange(0, N, chunk_size): arr1[i:i+chunk_size] = np.random.randint(2, size=(chunk_size, 4)).astype(np.uint8) arr2[i:i+chunk_size] = np.random.randint(2, size=(chunk_size, 4)).astype(np.uint8) del arr1 del arr2 arr1 = np.memmap(farr1, mode='r', dtype='uint8', shape=(N,4)) arr2 = np.memmap(farr2, mode='r', dtype='uint8', shape=(N,4)) equal = np.logical_and(arr1[:chunk_size], arr2[:chunk_size]).sum(0) for i in xrange(chunk_size, N, chunk_size): equal += np.logical_and(arr1[i:i+chunk_size], arr2[i:i+chunk_size]).sum(0) Skipper