Memory ?

Alex Martelli aleax at aleax.it
Mon Jul 8 09:03:40 EDT 2002


Shagshag13 wrote:

> 
> "Alex Martelli" <aleax at aleax.it>
> 
>> > That i can' t guess the size of my arrays (range from 0 to ???)
>>
>> Both the array.array type, and the vastly more powerful Numeric.array
>> type from the Numeric extension package, are arrays that can grow and
>> shrink.
> 
> sorry to bother, i didn't find a way to append a numeric.array, should i
> do this by using "concatenate" ? or do i miss the "correct efficient" way

You can change an array's shape, including its size (product of all the
items in the shape couple), by calling the resize method on the array --
but that's available only if there is no data-sharing between that array
and any other.  Don't be misled by the docstring of the resize method,
which says the method "returns" the resized array -- it doesn't, it returns
None, and it resizes the array in-place, filling it with default values
if it grows.  A typical example:

>>> x = Numeric.array([0.5]*7)
>>> y = Numeric.array([0.3]*7)
>>> z = x * y
>>> z.resize((10,))
>>> z
array([ 0.15,  0.15,  0.15,  0.15,  0.15,  0.15,  0.15,  0.  ,  0.  ,  0.  
])
>>>

However, calling the resize _method_ raises an exception if the array is
sharing data with any other array -- and, sometimes, just because Numeric
can't be absolutely certain that such sharing is not taking place, and in
doubt it chooses to guarantee correct semantics.

Function Numeric.resize(somearray, newshape) returns a new copy (filling
by replication, rather than by defaults) and thus is free from any such
constraints.

Function Numeric.concatenate, which you mention, also returns a new copy,
and thus is similarly free from any worry.

Of course, copying data can give performance problems if you have millions
of items involved.  However, don't think that the builtin Python lists would
be necessarily free from such problems!  A list typically overallocates so
it has some free space available right at the end -- so it doesn't have to
reallocate all the time if you just need to append a few items -- but that
extra space is, of course, a bounded amount... it can't last forever:-).
As you keep appending, you'll get (implicit) reallocations.  lists can do
those implicitly because they never allow data sharing between different
list objects, of course.

There are several strategies that can serve you well when you need to put
together a potentially large array from an a-priori unknown number of
smaller arrays.  If feasible, collecting all the smaller arrays into a
sequence (e.g. a list) and then joining (e.g. with Numeric.concatenate)
only at the end, tends to be the best-performing way quite often.  If you
know a reasonable upper bound, you can overallocate yourself, and then
you may be able to exploit Numeric.array's ability to share data among
arrays to refer to sub-arrays, such as the one "built by the concatenations
so far", quite cheaply.  To clarify the latter strategy, an example:

>>> large = Numeric.array(' '*1000)
>>> sofar = large[0:0]
>>> word='ciao'; large[len(sofar):len(sofar)+len(word)] = 
Numeric.array(word)
>>> sofar = large[:len(sofar)+len(word)]
>>> sofar
array([c, i, a, o],'c')
>>> word='zurp'; large[len(sofar):len(sofar)+len(word)] = 
Numeric.array(word)
>>> sofar = large[:len(sofar)+len(word)]
>>> sofar
array([c, i, a, o, z, u, r, p],'c')
>>>

It's obviously quite easy to wrap up these idioms into a small utility
class, if such "concatenations with frequent need for access to all
the data concatenated up to now" are a recurring need in your code.

Wrapping it up as a class is quite advisable because it lets you most
easily experiment with drastically different strategies while letting
all the client-code (your code that _uses_ this class) perfectly
unchanged -- you can fix the interface, develop all the rest of your
application based on whatever simple implementation of that interface
first comes to mind, THEN go wild experimenting with alternative
implementation strategies and time the results based on actual usage
in your application (or some typical benchmark runs for it).  Python
is just great for this general approach at implementing applications
and the primitives that the applications rely upon.

Of course, once you do wrap it up as a class you can also deal with
other issues, such as "what happens when one of my appends overflow
the large-buffer so far allocated" (catch the error and reallocate a
larger buffer -- scale buffer size up exponentially if you need to
have constant amortized cost per appended item...), etc.


Alex




More information about the Python-list mailing list