take from structured array is faster than boolean indexing, but reshapes columns to 2D
Dear all- Structured arrays are great, but I am having problems filtering them efficiently. Reading through the mailing list, it seems like boolean arrays are the recommended approach to filtering arrays for arbitrary conditions, but my testing shows that a combination of take and where can be much faster when dealing with structured arrays: import timeit setup = "from numpy import random, where, zeros; r = random.random_integers(1e3, size=1e6); q = zeros((1e6), dtype=[('foo', 'u4'), ('bar', 'u4'), ('baz', 'u4')]); q['foo'] = r" statement1 = "s = q.take(where(q['foo'] < 500))" statement2 = "s = q[q['foo'] < 500]" t = timeit.Timer(statement1, setup) t.timeit(10) t = timeit.Timer(statement2, setup) t.timeit(10) Using the boolean array is about 4 times slower when dealing with large arrays. In my case, these operations are supposed to happen on a web server with a large number of requests, so the efficiency gain is important. However, the combination of take and where reshapes the columns of structured arrays to be 2-dimensional: q['foo'].shape
(1000000,) s = q[q['foo'] < 500] s['foo'].shape (499102,) s = q.take(where(q['foo'] < 500)) s['foo'].shape (1, 499102)
Is there a way to use this seemingly more efficient approach (take & where) and not have to manually reshape the columns? This seems ungainly for larger structured arrays. Or should I file this as a bug? Perhaps there are even more efficient approaches that I haven't thought of, but are obvious to others? Thanks in advance, Yours, -Chris -- ############################ Chris Mutel Ökologisches Systemdesign - Ecological Systems Design Institut f.Umweltingenieurwissenschaften - Institute for Environmental Engineering ETH Zürich - HIF C 42 - Schafmattstr. 6 8093 Zürich Telefon: +41 44 633 71 45 - Fax: +41 44 633 10 61 ############################
participants (1)
-
Christopher Mutel