[Numpy-discussion] A numpy accumulator...
Christopher Barker
Chris.Barker at noaa.gov
Mon Oct 5 14:06:27 EDT 2009
Francesc Alted wrote:
> A Saturday 03 October 2009 10:06:12 Christopher Barker escrigué:
>> This idea was inspired by a discussion at the SciPy conference, in which
>> we spent a LOT of time during the numpy tutorial talking about how to
>> accumulate values in an array when you don't know how big the array
>> needs to be when you start.
>> What I have in mind is very simple. It would be:
>> - Only 1-d
>> - Support append() and extend() methods
>> - support indexing and slicing
>> - Support any valid numpy dtype
>> - which could even get you pseudo n-d arrays...
>> - maybe it would act like an array in other ways, I'm not so sure.
>> - ufuncs, etc.
> That's interesting. I'd normally use the `resize()` method for what you want,
> but indeed your approach is way more easy-to-use.
Of course, this is using resize() under the hood, but giving it an
easier interface, but more importantly, it's adding the pre-allocation
for you, and the code to deal with that. I suppose I should benchmark
it, but I think calling resize(0 with every append would be a lot slower
(though maybe not -- might the compiler/os be pre-allocating some extra
memory anyway?)
I should profile this -- if you can call resize() with every new item,
and it's not too slow, then it may not be worth writing this class at
all (or I could make it simpler, maybe even an nd-array subclass instead.
> If you are looking for performance improvements, I'd have a look at the
> `PyArray_Resize()` function in 'core/src/multiarray/shape.c' (trunk). It
> seems to me that the zero-initialization of added memory can be skipped,
> allowing for more performance for the `resize()` method (most specially for
> large size increments).
I suppose so, but I doubt that's causing any of my performance issues.
Another thing to profile.
> A new parameter (say, ``zero_init=True``) could be
> added to `resize()` to specify that you don't want the memory initialized.
That does seem like a good idea, but maybe over my head to implement.
Now I need some time to work on this some more...
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list