From wtbridgman at Radix.Net Fri Dec 1 20:17:19 2000 From: wtbridgman at Radix.Net (W.T. Bridgman) Date: Fri, 1 Dec 2000 20:17:19 -0500 Subject: [AstroPy] State of Numeric 2 Message-ID: Paul Barrett sends this report on his work on Numeric 2. Tom ---------------------------------------------------------------------- Status of Numeric 2 The design of Numeric 2 enables new array types to be easily addd and all array operations to be UFuncs. This provides more extensible, flexible, and maintainable code. What follows is an outline of the basic design of Numeric2 and what we have accomplished so far. There are currently three primary classes: ArrayType: This is a simple class that describes the fundamental properties of an array-type, e.g. its name, its size in bytes, its coercion relations with respect to other types, etc. An instance of this type creates a singleton of that type, e.g. > Int32 = ArrayType('Int32', 4, 'doc-string') Its relation to the other types is defined when the C-extension module for that type is imported. The corresponding Python code is > Int32.astype[Real64] = Real64 This says that the Real64 array-type has higher priority that the Int32 array-type. UFunc: This class is the heart of Numeric 2. Its design is similar to that of ArrayType in that the UFunc creates a singleton callable object whose attributes are name, argument type (either input or output), and a CFunc dictionary; e.g. > add = UFunc('add', ('in', 'in', 'out'), 'doc-string') When defined the add instance has no C functions associated with it and therefore can do no work. The CFunc dictionary is populated later when the C-extension module for an array-type is imported. The corresponding Python code would be > add.register('add', (Int32, Int32, Int32), cfunc-add) In the C-extension modules initialization function, there are two C API functions: one to initialize the coercion rules and the other to register the CFunc objects. When an operation is applied to some arrays, the __call__ method is invoked. It gets the type of each array (if the output array is not given, it is created with the appropriate type.) and checks the CFunc dictionary for a key that matches the argument types. If it exists the operation is performed immediately, otherwise the best key is found and that operation with its associated conversion functions is used. The __call__ method then invokes a compute method written in C to iterate over slices of each array, namely: > _ufunc.compute(slice, data, func, swap, conv) The func argument is a CFuncObject, while swap and conv are lists of CFuncObjects, one for each array if necessary. The data argument is a list of buffer objects, one for each array, and the slice argument is a complex object specifying how many iterations to be done for each dimension, and the buffer offset and step size for each array and each dimension. We have predefined several UFuncs for use by the __call__ method, they are cast, swap, getitem, and setitem. The cast and swap functions do coercion and byte-swapping, resp. and the getitem and setitem functions do conversion between Numeric arrays and Python sequences. Other functions can be defined arbitrarily. Array: This class contains information about the array, such as shape, type, endian-ness of the data, etc.. Its operators, '+', '-', etc. just invoke the corresponding UFunc function, e.g. > def __add__(self, other): > return ufunc.add(self, other) C-extension modules: Numeric2 will have several C-extension modules. The primary module of this set is the _ufuncmodule.c. The intention of this module is to do the bare minimum, i.e. iterate over arrays using a specified C function. The interface of these functions remains the same as for the current Numeric, i.e. int (*CFunc)(char *data, int *steps, int repeat, void *func); and their functionality is expected to be the same, i.e. they iterate over the inner-most dimension. There will also be C-extension modules for each array type, e.g. _int32module.c, _real64module.c, etc. As I said before, when these modules are imported by the UFunc module, they will automatically register their functions and coercion rules. New or improved versions of these modules can be easily implemented and used without affecting the rest of Numeric2. That's basically it. As for progress, we have outlined the following steps: Step 1: implement basic UFunc capability - minimal Array class, ie. necessary class attributes and methods eg. .shape, .data, .type, etc. - minimal ArrayType class, eg. Int32, Real64, Complex64, Char, Object - minimal UFunc class, ie. UFunc instantiation, CFunction registration, UFunc call for 1D arrays including the rules for doing alignment, byte-swapping, and coercion. - minimal C-extension module (_UFunc) which does the innermost array loop in C. This step implements whatever is needed to do: 'c = add(a, b)' where a, b, and c are 1-D arrays. It will teach us how to add new UFuncs, to coerce the arrays, to pass the necessary information to a C iterator method and to do the actually computation. Step 2: continue enhancing the UFunc iterator and Array class - implement some access methods for the Array class, print, repr, getitem, setitem, etc. - implement multidimensional arrays - implement some of basic Array methods using UFuncs, e.g. +, -, etc. - enable UFuncs to use Python sequences. Step 3: complete the standard UFunc and Array class behavior - implement getslice and setslice behavior - work on Array broadcasting rules - implement Record type - implement reduce, reduceAt, and outer methods for UFuncs, Step 4 is - add more UFuncs - implement buffer or mmap access - etc. I've nearly completed Step 1. The one major change/enhancement to that step is to immediately implement iteration over multi-D arrays instead of 1-D arrays. Once this step is done, we will have working code to test and analyze. Since this design is so modular, particularly with respect to the array type modules, some work can be done in parallel. _____________________________________________________ AstroPy mailing list - astropy at stsci.edu http://lheawww.gsfc.nasa.gov/~bridgman/AstroPy/