[Numpy-discussion] A numpy accumulator...

Christopher Barker Chris.Barker at noaa.gov
Mon Oct 5 14:06:27 EDT 2009


Francesc Alted wrote:
> A Saturday 03 October 2009 10:06:12 Christopher Barker escrigué:
>> This idea was inspired by a discussion at the SciPy conference, in which
>> we spent a LOT of time during the numpy tutorial talking about how to
>> accumulate values in an array when you don't know how big the array
>> needs to be when you start.

>> What I have in mind is very simple. It would be:
>>    - Only 1-d
>>    - Support append() and extend() methods
>>    - support indexing and slicing
>>    - Support any valid numpy dtype
>>      - which could even get you pseudo n-d arrays...
>>    - maybe it would act like an array in other ways, I'm not so sure.
>>      - ufuncs, etc.

> That's interesting.  I'd normally use the `resize()` method for what you want, 
> but indeed your approach is way more easy-to-use.

Of course, this is using resize() under the hood, but giving it an 
easier interface, but more importantly, it's adding the pre-allocation 
for you, and the code to deal with that. I suppose I should benchmark 
it, but I think calling resize(0 with every append would be a lot slower 
(though maybe not -- might the compiler/os be pre-allocating some extra 
memory anyway?)

I should profile this -- if you can call resize() with every new item, 
and it's not too slow, then it may not be worth writing this class at 
all (or I could make it simpler, maybe even an nd-array subclass instead.

> If you are looking for performance improvements, I'd have a look at the 
> `PyArray_Resize()` function in 'core/src/multiarray/shape.c' (trunk).  It 
> seems to me that the zero-initialization of added memory can be skipped, 
> allowing for more performance for the `resize()` method (most specially for 
> large size increments).

I suppose so, but I doubt that's causing any of my performance issues. 
Another thing to profile.

> A new parameter (say, ``zero_init=True``) could be 
> added to `resize()` to specify that you don't want the memory initialized.

That does seem like a good idea, but maybe over my head to implement.

Now I need some time to work on this some more...

-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list