Re: [Matrix-SIG] An Experiment in code-cleanup.

Travis Oliphant writes:
1) The re-use of temporary arrays -- to conserve memory.
Please elaborate about this request.
When Python evaluates the expression:
Y = B*X + A
where A, B, X, and Y are all arrays, B*X creates a temporary array, T. A new array, Y, will be created to hold the result of T + A, and T will be deleted. If T and Y have the same shape and typecode, then instead of creating Y, T can be re-used to conserve memory.
2) A copy-on-write option -- to enhance performance.
I need more explanation of this as well.
This would be an advanced feature of arrays that use memory-mapping or access their arrays from disk. It is similar to the secondary cache of a CPU. The data is held in memory until a write request is made.
3) The initialization of arrays by default -- to help novices.
What kind of initialization are you taking about (we have zeros and ones and random already).
For mixed-type (or object) arrays containing strings, zeros() and ones() would be confusing. Therefore by default, integer and floating types are initialized to 0 and string types to ' ', and the option would be available to not initialize the array for performance.
4) The creation of a standard API -- which I guess is assumed, if it is to be part of the Python standard distribution.
Any suggestions as to what needs to be changed in the already somewhat standard API.
No, not exactly. But the last time I looked, I thought some improvements could be made to it.
5) The inclusion of IEEE support.
This was supposed to be there from the beginning, but it didn't get finished. Jim's original idea was to have two math modules, one which checked and gave error's for 1/0 and another that returned IEEE inf for 1/0.
The current umath does both with different types which is annoying.
When I last spoke to Jim about this at IPC6, I was under the impression that IEEE support was not fully implemented and much work still needed to be done. Has this situation changed since then?
And
6) Enhanced support for mixed-types or objects.
This last issue is very import to me and the astronomical community, since we routinely store data as (multi-dimensional) arrays of fixed length records or C-structures. A current deficiency of NumPy is that the object typecode does not work with the fromstring() method, so importing arrays of records from a binary file is just not possible. I've been developing my own C-extension type to handle this situation and have come to realize that my record type is really just a generalization of NumPy's types.
I would like to see the code for your generalized type which would help me see if there were some relatively painless way the two could be merged.
recordmodule.c is part of my PyFITS module for dealing with FITS files. You can find it here: ftp://ra.stsci.edu/pub/barrett/PyFITS_0.3.tgz I use NumPy to access fixed-type arrays and the record type for accessing mixed-type arrays. A common example is accessing the second element of a mixed-type (ie. an object) from the entire array. This returns a record type with a single element, which is equivalent to a NumPy array of fixed type. Therefore users expect this object to be a NumPy array and it isn't. They have to convert it to one.
two C-extension types merged. I think this enhancement can be done with minimal change to the current NumPy behavior and minor changes to the typecode system.
If you already see how to do it, then great.
Note that NumPy already has some support for an Object type. It has been proposed that it be removed, because it is not well supported and hence few people use it. I have the contrary opinion and feel we should enhance the Object type and make it much more usable. If you don't need it, then you don't have to use it. This enhancement really shouldn't get in the way of those who only use fixed-type arrays. So what changes to NumPy are needed? 1) Instead of a typecode (or in addition to the typecode for backward compatibility), I suggest an optional format keyword, which can be used to specify the mixed-type or object format. Namely, format = 'i, f, s10', where 'i' is an integer type, 'f' a floating point type, and s10 is a string of 10 characters. 2) Array access will be the same as it is now. For example # Create a 10x10 mixed-type array. A = array((10, 10), format = 'i, f, 10s') # Create a 10x10 fixed-type array. B = array((10, 10), typecode = 'i') # Print a 5x5 subarray of mixed-type. print A[:5,:5] # Print a 5x5 subarray of fixed-type print B[:5,:5] # Or # (Note that the 3rd index is optional for fixed-type arrays, it # always defaults to 0.) print B[:5,:5,0] # Print the second element of the mixed-type of the entire array. # Note that this is now an array of fixed-type. print A[:,:,1] The major thorn that I see at this point is how to reconcile the behavior of numbers and strings during operations. But I don't see this as an intractable problem. I actually believe this enhancement will encourage us to create a better and more generic multi-dimensional array module by concentrating on the behavioral aspects of this extension type. Note that J, which NumPy is base upon, allows such mixed-types. -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-516-6714 DESD/DPT FAX: 410-516-8615 Baltimore, MD 21218

So what changes to NumPy are needed?
1) Instead of a typecode (or in addition to the typecode for backward compatibility), I suggest an optional format keyword, which can be used to specify the mixed-type or object format. Namely, format = 'i, f, s10', where 'i' is an integer type, 'f' a floating point type, and s10 is a string of 10 characters.
I'd suggest to go all the way and make it a real object, not just a string. That object can then have useful attributes, like size in bytes, maxval, minval, some indication of precision, etc. Logically, itemsize should be an attribute of the numeric type of an array, not of the array itself. --david ascher
participants (2)
-
David Ascher
-
Paul Barrett