[PYTHON MATRIX-SIG] More on Less Copying
Mon, 10 Mar 1997 18:54:31 -0700
Here are some further comments. These come from the perspective of
someone who likes to avoid copying data needlessly.
You may well criticize me that I'm trying to optimize prematurely.
However, some of my swap space is on a really noisy half-height disk,
so I know what it sounds like to page through of order 100 MB.
(a) One of my needs is to read large arrays of arbitrary rank from
binary files. The size of the array is known in advance but need not
be equal to the remaining size of the file.
I think I can currently do this by reading the data into a string,
converting it into an array using fromstring(), and then changing the
shape with reshape(). With the current implementation, I cannot see
how to avoid the copying twice (once into the string and once into the
I could get around the problem if a fromfile() or read() function or
method was provided to read a specified amount of data from a file.
Two ways to do this might be:
a = Numeric.new(shape, typecode) # create uninitialized new array
a.read(file) # fill array; copy once
a = Numeric.read(file, shape, typecode)
a = zeros(shape, typecode)
would not really cut it as it would involve paging through the array
twice. However, a Numeric.new() function might be a useful
generalization of ones() and zeros():
a = Numeric.new(shape, typecode, value=1) # initialize to 1
a = Numeric.new(shape, typecode, value=0) # initialize to 0
a = Numeric.new(shape, typecode, value=None) # do not initialise
Similarly, I can avoid a copy if a tofile() or write() function or
method were provided.
I though about the following:
a = zeros([ny,nx], type)
for y in range(0, ny-1):
str = read(nx * bytes-in-a-type)
line = Numeric.fromstring(str)
a[y,:] = line
If Python were C, this would probably be acceptable, because I could
reuse str and line. (My desire to avoid copying is really a desire to
avoid cache and page faults.) However, in Python I have to throw away
line and str each time, so not only do I copy three times but I pump a
fair bit of memory through the garbage collector.
Am I missing something?
(b) I've heard mutterings about in-place operations, which avoid a
copy, but they aren't implemented yet. Are these important to others
here? I'm going to assume that they are.
No operators are currently defined in the base language for in-place
operations. That means we have to explicity use methods. I think I
would be willing to live with:
if it avoided a copy. (The method name is just off the top of my head;
(c) However, there is a non-zero chance that at some point in-place
operations on scalars are going to appear in the base language. It
would be nice to be in a position to be upwardly compatible with that
A problem is that we don't know what method names would be used (i.e.,
what the equivalent of __add__ will be for a += b). Maybe we should
propose a set of method names for these operators that could be
adopted by the base language if it were to adopt in-place operators.
That would, again, give us forwards compatibility.
Obviously, with my limited experience I'm not going to jump in and
offer to implement these in C. But I'm aware that James Hugunin has
better things to do than cater to my every whim. I suggest a way
forward is first to discuss whether there is a need for these methods.
If there is an agreed need, we should then discuss their design and
implement them using the existing methods (i.e., implement read using
fromstring and the in-place operations using the copying operations).
At some later point, someone with the appropriate motivation and
experience could rewrite the methods in C. If you wait long enough
that person might even be me. Does that sound reasonable?
I'm very excited by Numeric Python. I wish I didn't have to actually
get any real work done this week and had more time to explore it.
MATRIX-SIG - SIG on Matrix Math for Python
send messages to: email@example.com
administrivia to: firstname.lastname@example.org