[Numpy-discussion] merge_arrays is very slow; alternatives?

Gerrit Holl gerrit.holl at gmail.com
Fri Nov 26 14:16:56 EST 2010


Hi,

upon profiling my code, I found that
numpy.lib.recfunctions.merge_arrays is extremely slow; it does some
7000 rows/second. This is not acceptable for me.

I have two large record arrays, or arrays with a complicated dtype.
All I want to do is to merge them into one. I don't think that should
have to be a very slow operation, I don't need to copy anything, I
just want to view the two record arrays as one.

How can I do this in a faster way?

In [45]: cProfile.runctx("numpy.lib.recfunctions.merge_arrays([metarows,
targetrows2], flatten=True)", globals(), locals())
        225381902 function calls (150254635 primitive calls) in
166.620 CPU seconds

  Ordered by: standard name

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       1    0.031    0.031  166.620  166.620 <string>:1(<module>)
    68/1    0.000    0.000    0.000    0.000 _internal.py:82(_array_descr)
       2    0.000    0.000    0.000    0.000 numeric.py:286(asanyarray)
       2    0.000    0.000    0.000    0.000 recfunctions.py:135(flatten_descr)
       1    0.000    0.000    0.001    0.001 recfunctions.py:161(zip_descr)
149165600/74038400  117.195    0.000  139.701    0.000
recfunctions.py:235(_izip_fields_flat)
 1088801   12.146    0.000  151.847    0.000 recfunctions.py:263(izip_records)
       3    0.000    0.000    0.000    0.000 recfunctions.py:277(sentinel)
       1    4.599    4.599  166.589  166.589 recfunctions.py:328(merge_arrays)
       3    0.000    0.000    0.000    0.000 recfunctions.py:406(<genexpr>)
 75127201   22.506    0.000   22.506    0.000 {isinstance}
      69    0.000    0.000    0.000    0.000 {len}
       1    0.000    0.000    0.000    0.000 {map}
       1    0.000    0.000    0.000    0.000 {max}
       2    0.000    0.000    0.000    0.000 {method '__array__' of
'numpy.ndarray' objects}
     136    0.000    0.000    0.000    0.000 {method 'append' of
'list' objects}
       1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
       2    0.000    0.000    0.000    0.000 {method 'extend' of
'list' objects}
       2    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
       2    0.000    0.000    0.000    0.000 {method 'ravel' of
'numpy.ndarray' objects}
       2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
       1   10.142   10.142   10.142   10.142 {numpy.core.multiarray.fromiter}


Gerrit.

--
Gerrit Holl
PhD student at Department of Space Science, Luleå University of
Technology, Kiruna, Sweden
http://www.sat.ltu.se/members/gerrit/



More information about the NumPy-Discussion mailing list