
On Wed, Dec 23, 2015 at 4:01 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
But my implementation is quite slow, especially when you add one item at a time:
python benchmark.py Python list, append 100000 items: 0.01161 Array list, append 100000 items: 0.46854
are you pre-allocating any extra space? if not -- it's going to be really, really pokey when adding a little bit at a time. With my Accumulator class: https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulato... I pre-allocate a larger numpy array to start, and it gets re-allocated, with some extra, when filled, using ndarray.resize() this is quite fast. These are settable parameters in the class: DEFAULT_BUFFER_SIZE = 128 # original buffer created. BUFFER_EXTEND_SIZE = 1.25 # array.array uses 1+1/16 -- that seems small to me. I looked at the code in array.array (and list, I think), and it does stuff to optimize very small arrays, which I figured wasn't the use-case here :-) But I did a bunch of experimentation, and as long as you pre-allocate _some_ it doesn't make much difference how much :-) BTW, I just went in an updated and tested the Accumulator class code -- it needed some tweaks, but it's working now. The cython version is in an unknown state... some profiling: In [11]: run profile_accumulator.py In [12]: timeit accum1(10000) 100 loops, best of 3: 3.91 ms per loop In [13]: timeit list1(10000) 1000 loops, best of 3: 1.15 ms per loop These are simply appending 10,000 integers in a loop -- with teh list, the list is turned into a numpy array at the end. So it's still faster to accumulate in a list, then make an array, but only a about a factor of 3 -- I think this is because you are staring with a python integer -- with the accumulator function, you need to be checking type and pulling a native integer out with each append. but a list can append a python object with no type checking or anything. Then the conversion from list to array is all in C. Note that the accumulator version is still more memory efficient... In [14]: timeit accum2(10000) 100 loops, best of 3: 3.84 ms per loop this version pre-allocated the whole internal buffer -- not much faster the buffer re-allocation isn't a big deal (thanks to ndarray.resize using realloc(), and not creating a new numpy array) In [24]: timeit list_extend1(100000) 100 loops, best of 3: 4.15 ms per loop In [25]: timeit accum_extend1(100000) 1000 loops, best of 3: 1.37 ms per loop This time, the stuff is added in chunks 100 elements at a time -- the chunks being created ahead of time -- a list with range() the first time, and an array with arange() the second. much faster to extend with arrays... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov