PEP 209: Multi-dimensional Arrays
![](https://secure.gravatar.com/avatar/c65e537418c534976be9316074c4bdb6.jpg?s=120&d=mm&r=g)
The first draft of PEP 209: Multi-dimensional Arrays is ready for comment. It's primary emphasis is aimed at array operations, but its design is intended to provide a general framework for working with multi-dimensional arrays. This PEP covers a lot of ground and so does not go into much detail at this stage. The hope is that we can fill them in as time goes on. It also presents several Open Issues that need to be discussed. Cheers, Paul -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PEP: 209 Title: Multi-dimensional Arrays Version: Author: barrett@stsci.edu (Paul Barrett), oliphant@ee.byu.edu (Travis Oliphant) Python-Version: 2.2 Status: Draft Type: Standards Track Created: 03-Jan-2001 Post-History: Abstract This PEP proposes a redesign and re-implementation of the multi- dimensional array module, Numeric, to make it easier to add new features and functionality to the module. Aspects of Numeric 2 that will receive special attention are efficient access to arrays exceeding a gigabyte in size and composed of inhomogeneous data structures or records. The proposed design uses four Python classes: ArrayType, UFunc, Array, and ArrayView; and a low-level C-extension module, _ufunc, to handle the array operations efficiently. In addition, each array type has its own C-extension module which defines the coercion rules, operations, and methods for that type. This design enables new types, features, and functionality to be added in a modular fashion. The new version will introduce some incompatibilities with the current Numeric. Motivation Multi-dimensional arrays are commonly used to store and manipulate data in science, engineering, and computing. Python currently has an extension module, named Numeric (henceforth called Numeric 1), which provides a satisfactory set of functionality for users manipulating homogeneous arrays of data of moderate size (of order 10 MB). For access to larger arrays (of order 100 MB or more) of possibly inhomogeneous data, the implementation of Numeric 1 is inefficient and cumbersome. In the future, requests by the Numerical Python community for additional functionality is also likely as PEPs 211: Adding New Linear Operators to Python, and 225: Elementwise/Objectwise Operators illustrate. Proposal This proposal recommends a re-design and re-implementation of Numeric 1, henceforth called Numeric 2, which will enable new types, features, and functionality to be added in an easy and modular manner. The initial design of Numeric 2 should focus on providing a generic framework for manipulating arrays of various types and should enable a straightforward mechanism for adding new array types and UFuncs. Functional methods that are more specific to various disciplines can then be layered on top of this core. This new module will still be called Numeric and most of the behavior found in Numeric 1 will be preserved. The proposed design uses four Python classes: ArrayType, UFunc, Array, and ArrayView; and a low-level C-extension module to handle the array operations efficiently. In addition, each array type has its own C-extension module which defines the coercion rules, operations, and methods for that type. At a later date, when core functionality is stable, some Python classes can be converted to C-extension types. Some planned features are: 1. Improved memory usage This feature is particularly important when handling large arrays and can produce significant improvements in performance as well as memory usage. We have identified several areas where memory usage can be improved: a. Use a local coercion model Instead of using Python's global coercion model which creates temporary arrays, Numeric 2, like Numeric 1, will implement a local coercion model as described in PEP 208 which defers the responsibility of coercion to the operator. By using internal buffers, a coercion operation can be done for each array (including output arrays), if necessary, at the time of the operation. Benchmarks [1] have shown that performance is at most degraded only slightly and is improved in cases where the internal buffers are less than the L2 cache size and the processor is under load. To avoid array coercion altogether, C functions having arguments of mixed type are allowed in Numeric 2. b. Avoid creation of temporary arrays In complex array expressions (i.e. having more than one operation), each operation will create a temporary array which will be used and then deleted by the succeeding operation. A better approach would be to identify these temporary arrays and reuse their data buffers when possible, namely when the array shape and type are the same as the temporary array being created. This can be done by checking the temparory array's reference count. If it is 1, then it will be deleted once the operation is done and is a candidate for reuse. c. Optional use of memory-mapped files Numeric users sometimes need to access data from very large files or to handle data that is greater than the available memory. Memory-mapped arrays provide a mechanism to do this by storing the data on disk while making it appear to be in memory. Memory- mapped arrays should improve access to all files by eliminating one of two copy steps during a file access. Numeric should be able to access in-memory and memory-mapped arrays transparently. d. Record access In some fields of science, data is stored in files as binary records. For example in astronomy, photon data is stored as a 1 dimensional list of photons in order of arrival time. These records or C-like structures contain information about the detected photon, such as its arrival time, its position on the detector, and its energy. Each field may be of a different type, such as char, int, or float. Such arrays introduce new issues that must be dealt with, in particular byte alignment or byte swapping may need to be performed for the numeric values to be properly accessed (though byte swapping is also an issue for memory mapped data). Numeric 2 is designed to automatically handle alignment and representational issues when data is accessed or operated on. There are two approaches to implementing records; as either a derived array class or a special array type, depending on your point-of- view. We defer this discussion to the Open Issues section. 2. Additional array types Numeric 1 has 11 defined types: char, ubyte, sbyte, short, int, long, float, double, cfloat, cdouble, and object. There are no ushort, uint, or ulong types, nor are there more complex types such as a bit type which is of use to some fields of science and possibly for implementing masked-arrays. The design of Numeric 1 makes the addition of these and other types a difficult and error-prone process. To enable the easy addition (and deletion) of new array types such as a bit type described below, a re-design of Numeric is necessary. a. Bit type The result of a rich comparison between arrays is an array of boolean values. The result can be stored in an array of type char, but this is an unnecessary waste of memory. A better implementation would use a bit or boolean type, compressing the array size by a factor of eight. This is currently being implemented for Numeric 1 (by Travis Oliphant) and should be included in Numeric 2. 3. Enhanced array indexing syntax The extended slicing syntax was added to Python to provide greater flexibility when manipulating Numeric arrays by allowing step-sizes greater than 1. This syntax works well as a shorthand for a list of regularly spaced indices. For those situations where a list of irregularly spaced indices are needed, an enhanced array indexing syntax would allow 1-D arrays to be arguments. 4. Rich comparisons The implementation of PEP 207: Rich Comparisons in Python 2.1 provides additional flexibility when manipulating arrays. We intend to implement this feature in Numeric 2. 5. Array broadcasting rules When an operation between a scalar and an array is done, the implied behavior is to create a new array having the same shape as the array operand containing the scalar value. This is called array broadcasting. It also works with arrays of lesser rank, such as vectors. This implicit behavior is implemented in Numeric 1 and will also be implemented in Numeric 2. Design and Implementation The design of Numeric 2 has four primary classes: 1. ArrayType: This is a simple class that describes the fundamental properties of an array-type, e.g. its name, its size in bytes, its coercion relations with respect to other types, etc., e.g. > Int32 = ArrayType('Int32', 4, 'doc-string') Its relation to the other types is defined when the C-extension module for that type is imported. The corresponding Python code is: > Int32.astype[Real64] = Real64 This says that the Real64 array-type has higher priority than the Int32 array-type. The following attributes and methods are proposed for the core implementation. Additional attributes can be added on an individual basis, e.g. .bitsize or .bitstrides for the bit type. Attributes: .name: e.g. "Int32", "Float64", etc. .typecode: e.g. 'i', 'f', etc. (for backward compatibility) .size (in bytes): e.g. 4, 8, etc. .array_rules (mapping): rules between array types .pyobj_rules (mapping): rules between array and python types .doc: documentation string Methods: __init__(): initialization __del__(): destruction __repr__(): representation C-API: This still needs to be fleshed-out. 2. UFunc: This class is the heart of Numeric 2. Its design is similar to that of ArrayType in that the UFunc creates a singleton callable object whose attributes are name, total and input number of arguments, a document string, and an empty CFunc dictionary; e.g. > add = UFunc('add', 3, 2, 'doc-string') When defined the add instance has no C functions associated with it and therefore can do no work. The CFunc dictionary is populated or registerd later when the C-extension module for an array-type is imported. The arguments of the regiser method are: function name, function descriptor, and the CUFunc object. The corresponding Python code is > add.register('add', (Int32, Int32, Int32), cfunc-add) In the initialization function of an array type module, e.g. Int32, there are two C API functions: one to initialize the coercion rules and the other to register the CFunc objects. When an operation is applied to some arrays, the __call__ method is invoked. It gets the type of each array (if the output array is not given, it is created from the coercion rules) and checks the CFunc dictionary for a key that matches the argument types. If it exists the operation is performed immediately, otherwise the coercion rules are used to search for a related operation and set of conversion functions. The __call__ method then invokes a compute method written in C to iterate over slices of each array, namely: > _ufunc.compute(slice, data, func, swap, conv) The 'func' argument is a CFuncObject, while the 'swap' and 'conv' arguments are lists of CFuncObjects for those arrays needing pre- or post-processing, otherwise None is used. The data argument is a list of buffer objects, and the slice argument gives the number of iterations for each dimension along with the buffer offset and step size for each array and each dimension. We have predefined several UFuncs for use by the __call__ method: cast, swap, getobj, and setobj. The cast and swap functions do coercion and byte-swapping, resp. and the getobj and setobj functions do coercion between Numeric arrays and Python sequences. The following attributes and methods are proposed for the core implementation. Attributes: .name: e.g. "add", "subtract", etc. .nargs: number of total arguments .iargs: number of input arguments .cfuncs (mapping): the set C functions .doc: documentation string Methods: __init__(): initialization __del__(): destruction __repr__(): representation __call__(): look-up and dispatch method initrule(): initialize coercion rule uninitrule(): uninitialize coercion rule register(): register a CUFunc unregister(): unregister a CUFunc C-API: This still needs to be fleshed-out. 3. Array: This class contains information about the array, such as shape, type, endian-ness of the data, etc.. Its operators, '+', '-', etc. just invoke the corresponding UFunc function, e.g. > def __add__(self, other): > return ufunc.add(self, other) The following attributes, methods, and functions are proposed for the core implementation. Attributes: .shape: shape of the array .format: type of the array .real (only complex): real part of a complex array .imag (only complex): imaginary part of a complex array Methods: __init__(): initialization __del__(): destruction __repr_(): representation __str__(): pretty representation __cmp__(): rich comparison __len__(): __getitem__(): __setitem__(): __getslice__(): __setslice__(): numeric methods: copy(): copy of array aslist(): create list from array asstring(): create string from array Functions: fromlist(): create array from sequence fromstring(): create array from string array(): create array with shape and value concat(): concatenate two arrays resize(): resize array C-API: This still needs to be fleshed-out. 4. ArrayView This class is similar to the Array class except that the reshape and flat methods will raise exceptions, since non-contiguous arrays cannot be reshaped or flattened using just pointer and step-size information. C-API: This still needs to be fleshed-out. 5. C-extension modules: Numeric2 will have several C-extension modules. a. _ufunc: The primary module of this set is the _ufuncmodule.c. The intention of this module is to do the bare minimum, i.e. iterate over arrays using a specified C function. The interface of these functions is the same as Numeric 1, i.e. int (*CFunc)(char *data, int *steps, int repeat, void *func); and their functionality is expected to be the same, i.e. they iterate over the inner-most dimension. The following attributes and methods are proposed for the core implementation. Attibutes: Methods: compute(): C-API: This still needs to be fleshed-out. b. _int32, _real64, etc.: There will also be C-extension modules for each array type, e.g. _int32module.c, _real64module.c, etc. As mentioned previously, when these modules are imported by the UFunc module, they will automatically register their functions and coercion rules. New or improved versions of these modules can be easily implemented and used without affecting the rest of Numeric 2. Open Issues 1. Does slicing syntax default to copy or view behavior? The default behavior of Python is to return a copy of a sub-list or tuple when slicing syntax is used, whereas Numeric 1 returns a view into the array. The choice made for Numeric 1 is apparently for reasons of performance: the developers wish to avoid the penalty of allocating and copying the data buffer during each array operation and feel that the need for a deepcopy of an array to be rare. Yet, some have argued that Numeric's slice notation should also have copy behavior to be consistent with Python lists. In this case the performance penalty associated with copy behavior can be minimized by implementing copy-on-write. This scheme has both arrays sharing one data buffer (as in view behavior) until either array is assigned new data at which point a copy of the data buffer is made. View behavior would then be implemented by an ArrayView class, whose behavior be similar to Numeric 1 arrays, i.e. .shape is not settable for non-contiguous arrays. The use of an ArrayView class also makes explicit what type of data the array contains. 2. Does item syntax default to copy or view behavior? A similar question arises with the item syntax. For example, if a = [[0,1,2], [3,4,5]] and b = a[0], then changing b[0] also changes a[0][0], because a[0] is a reference or view of the first row of a. Therefore, if c is a 2-d array, it would appear that c[i] should return a 1-d array which is a view into, instead of a copy of, c for consistency. Yet, c[i] can be considered just a shorthand for c[i,:] which would imply copy behavior assuming slicing syntax returns a copy. Should Numeric 2 behave the same way as lists and return a view or should it return a copy. 3. How is scalar coercion implemented? Python has fewer numeric types than Numeric which can cause coercion problems. For example when multiplying a Python scalar of type float and a Numeric array of type float, the Numeric array is converted to a double, since the Python float type is actually a double. This is often not the desired behavior, since the Numeric array will be doubled in size which is likely to be annoying, particularly for very large arrays. We prefer that the array type trumps the python type for the same type class, namely integer, float, and complex. Therefore an operation between a Python integer and an Int16 (short) array will return an Int16 array. Whereas an operation between a Python float and an Int16 array would return a Float64 (double) array. Operations between two arrays use normal coercion rules. 4. How is integer division handled? In a future version of Python, the behavior of integer division will change. The operands will be converted to floats, so the result will be a float. If we implement the proposed scalar coercion rules where arrays have precedence over Python scalars, then dividing an array by an integer will return an integer array and will not be consistent with a future version of Python which would return an array of type double. Scientific programmers are familiar with the distinction between integer and float-point division, so should Numeric 2 continue with this behavior? 5. How should records be implemented? There are two approaches to implementing records depending on your point-of-view. The first is two divide arrays into separate classes depending on the behavior of their types. For example numeric arrays are one class, strings a second, and records a third, because the range and type of operations of each class differ. As such, a record array is not a new type, but a mechanism for a more flexible form of array. To easily access and manipulate such complex data, the class is comprised of numeric arrays having different byte offsets into the data buffer. For example, one might have a table consisting of an array of Int16, Real32 values. Two numeric arrays, one with an offset of 0 bytes and a stride of 6 bytes to be interpeted as Int16, and one with an offset of 2 bytes and a stride of 6 bytes to be interpreted as Real32 would represent the record array. Both numeric arrays would refer to the same data buffer, but have different offset and stride attributes, and a different numeric type. The second approach is to consider a record as one of many array types, albeit with fewer, and possibly different, array operations than for numeric arrays. This approach considers an array type to be a mapping of a fixed-length string. The mapping can either be simple, like integer and floating-point numbers, or complex, like a complex number, a byte string, and a C-structure. The record type effectively merges the struct and Numeric modules into a multi-dimensional struct array. This approach implies certain changes to the array interface. For example, the 'typecode' keyword argument should probably be changed to the more descriptive 'format' keyword. a. How are record semantics defined and implemented? Which ever implementation approach is taken for records, the syntax and semantics of how they are to be accessed and manipulated must be decided, if one wishes to have access to sub-fields of records. In this case, the record type can essentially be considered an inhomogeneous list, like a tuple returned by the unpack method of the struct module; and a 1-d array of records may be interpreted as a 2-d array with the second dimension being the index into the list of fields. This enhanced array semantics makes access to an array of one or more of the fields easy and straightforward. It also allows a user to do array operations on a field in a natural and intuitive way. If we assume that records are implemented as an array type, then last dimension defaults to 0 and can therefore be neglected for arrays comprised of simple types, like numeric. 6. How are masked-arrays implemented? Masked-arrays in Numeric 1 are implemented as a separate array class. With the ability to add new array types to Numeric 2, it is possible that masked-arrays in Numeric 2 could be implemented as a new array type instead of an array class. 7. How are numerical errors handled (IEEE floating-point errors in particular)? It is not clear to the proposers (Paul Barrett and Travis Oliphant) what is the best or preferred way of handling errors. Since most of the C functions that do the operation, iterate over the inner-most (last) dimension of the array. This dimension could contain a thousand or more items having one or more errors of differing type, such as divide-by-zero, underflow, and overflow. Additionally, keeping track of these errors may come at the expense of performance. Therefore, we suggest several options: a. Print a message of the most severe error, leaving it to the user to locate the errors. b. Print a message of all errors that occurred and the number of occurrences, leaving it to the user to locate the errors. c. Print a message of all errors that occurred and a list of where they occurred. d. Or use a hybrid approach, printing only the most severe error, yet keeping track of what and where the errors occurred. This would allow the user to locate the errors while keeping the error message brief. 8. What features are needed to ease the integration of FORTRAN libraries and code? It would be a good idea at this stage to consider how to ease the integration of FORTRAN libraries and user code in Numeric 2. Implementation Steps 1. Implement basic UFunc capability a. Minimal Array class: Necessary class attributes and methods, e.g. .shape, .data, .type, etc. b. Minimal ArrayType class: Int32, Real64, Complex64, Char, Object c. Minimall UFunc class: UFunc instantiation, CFunction registration, UFunc call for 1-D arrays including the rules for doing alignment, byte-swapping, and coercion. d. Minimal C-extension module: _UFunc, which does the innermost array loop in C. This step implements whatever is needed to do: 'c = add(a, b)' where a, b, and c are 1-D arrays. It teaches us how to add new UFuncs, to coerce the arrays, to pass the necessary information to a C iterator method and to do the actually computation. 2. Continue enhancing the UFunc iterator and Array class a. Implement some access methods for the Array class: print, repr, getitem, setitem, etc. b. Implement multidimensional arrays c. Implement some of basic Array methods using UFuncs: +, -, *, /, etc. d. Enable UFuncs to use Python sequences. 3. Complete the standard UFunc and Array class behavior a. Implement getslice and setslice behavior b. Work on Array broadcasting rules c. Implement Record type 4. Add additional functionality a. Add more UFuncs b. Implement buffer or mmap access Incompatibilities The following is a list of incompatibilities in behavior between Numeric 1 and Numeric 2. 1. Scalar corcion rules Numeric 1 has single set of coercion rules for array and Python numeric types. This can cause unexpected and annoying problems during the calculation of an array expression. Numeric 2 intends to overcome these problems by having two sets of coercion rules: one for arrays and Python numeric types, and another just for arrays. 2. No savespace attribute The savespace attribute in Numeric 1 makes arrays with this attribute set take precedence over those that do not have it set. Numeric 2 will not have such an attribute and therefore normal array coercion rules will be in effect. 3. Slicing syntax returns a copy The slicing syntax in Numeric 1 returns a view into the original array. The slicing behavior for Numeric 2 will be a copy. You should use the ArrayView class to get a view into an array. 4. Boolean comparisons return a boolean array A comparison between arrays in Numeric 1 results in a Boolean scalar, because of current limitations in Python. The advent of Rich Comparisons in Python 2.1 will allow an array of Booleans to be returned. 5. Type characters are depricated Numeric 2 will have an ArrayType class composed of Type instances, for example Int8, Int16, Int32, and Int for signed integers. The typecode scheme in Numeric 1 will be available for backward compatibility, but will be depricated. Appendices A. Implicit sub-arrays iteration A computer animation is composed of a number of 2-D images or frames of identical shape. By stacking these images into a single block of memory, a 3-D array is created. Yet the operations to be performed are not meant for the entire 3-D array, but on the set of 2-D sub-arrays. In most array languages, each frame has to be extracted, operated on, and then reinserted into the output array using a for-like loop. The J language allows the programmer to perform such operations implicitly by having a rank for the frame and array. By default these ranks will be the same during the creation of the array. It was the intention of the Numeric 1 developers to implement this feature, since it is based on the language J. The Numeric 1 code has the required variables for implementing this behavior, but was never implemented. We intend to implement implicit sub-array iteration in Numeric 2, if the array broadcasting rules found in Numeric 1 do not fully support this behavior. Copyright This document is placed in the public domain. Related PEPs PEP 207: Rich Comparisons by Guido van Rossum and David Ascher PEP 208: Reworking the Coercion Model by Neil Schemenauer and Marc-Andre' Lemburg PEP 211: Adding New Linear Algebra Operators to Python by Greg Wilson PEP 225: Elementwise/Objectwise Operators by Huaiyu Zhu PEP 228: Reworking Python's Numeric Model by Moshe Zadka References [1] P. Greenfield 2000. private communication.
![](https://secure.gravatar.com/avatar/802b07155a8ab3996b825eca778f770e.jpg?s=120&d=mm&r=g)
Some random PEP talk.
"PB" == Paul Barrett <Barrett@stsci.edu> writes:
PB> 2. Additional array types PB> Numeric 1 has 11 defined types: char, ubyte, sbyte, short, int, PB> long, float, double, cfloat, cdouble, and object. There are no PB> ushort, uint, or ulong types, nor are there more complex types PB> such as a bit type which is of use to some fields of science and PB> possibly for implementing masked-arrays. True: I would have had a much easier life with a ushort type. PB> Its relation to the other types is defined when the C-extension PB> module for that type is imported. The corresponding Python code PB> is: >> Int32.astype[Real64] = Real64 I understand this is to be done by the Int32 C extension module. But how does it know about Real64? PB> Attributes: PB> .name: e.g. "Int32", "Float64", etc. PB> .typecode: e.g. 'i', 'f', etc. PB> (for backward compatibility) .typecode() is a method now. PB> .size (in bytes): e.g. 4, 8, etc. "element size?"
add.register('add', (Int32, Int32, Int32), cfunc-add)
Typo: cfunc-add is an expression, not an identifier. An implementation of a (Int32, Float32, Float32) add is possible and desirable as mentioned earlier in the document. Which C module is going to declare such a combination? PB> asstring(): create string from array Not "tostring" like now? PB> 4. ArrayView PB> This class is similar to the Array class except that the reshape PB> and flat methods will raise exceptions, since non-contiguous PB> arrays cannot be reshaped or flattened using just pointer and PB> step-size information. This was completely unclear to me until here. I must say I find this a strange way of handling things. I haven't looked into implementation details, but wouldn't it feel more natural if an Array would just be the "data", and an ArrayView would contain the dimensions and strides. Completely separated. One would always need a pair, but more than one ArrayView could use the same Array. PB> a. _ufunc: PB> 1. Does slicing syntax default to copy or view behavior? Numeric 1 uses slicing for view, and a method for copy. "Feeling" compatible with core python would require copy on rhs, and view on lhs of an assignment. Is that distinction possible? If copy is the default for slicing, how would one make a view? PB> 2. Does item syntax default to copy or view behavior? view. PB> Yet, c[i] can be considered just a shorthand for c[i,:] which PB> would imply copy behavior assuming slicing syntax returns a copy. If you reason that way, then c is just a shorthand for c[...] too. PB> 3. How is scalar coercion implemented? PB> Python has fewer numeric types than Numeric which can cause PB> coercion problems. For example when multiplying a Python scalar PB> of type float and a Numeric array of type float, the Numeric array PB> is converted to a double, since the Python float type is actually PB> a double. This is often not the desired behavior, since the PB> Numeric array will be doubled in size which is likely to be PB> annoying, particularly for very large arrays. Sure. That is handled reasonably well by the current Numeric 1. To extend this, I'd like to comment that I have never really understood the philosophy of taking the largest type for coercion in all languages. Being a scientist, I have learned that when you multiply a very accurate number with a very approximate number, your result is going to be very approximate, not very accurate! It would thus be more logical to have Float32*Float64 return a Float32! PB> In a future version of Python, the behavior of integer division PB> will change. The operands will be converted to floats, so the PB> result will be a float. If we implement the proposed scalar PB> coercion rules where arrays have precedence over Python scalars, PB> then dividing an array by an integer will return an integer array PB> and will not be consistent with a future version of Python which PB> would return an array of type double. Scientific programmers are PB> familiar with the distinction between integer and float-point PB> division, so should Numeric 2 continue with this behavior? Numeric 2 should be as compatible as reasonably possible with core python. But my question is: how would we do integer division of arrays? A ufunc for which no operator shortcut exists? PB> 7. How are numerical errors handled (IEEE floating-point errors in PB> particular)? I am developing my code on Linux and IRIX. I have seen that where Numeric code on Linux runs fine, the same code on IRIX may "core dump" on a FPE (e.g. arctan2(0,0)). That difference should be avoided. PB> a. Print a message of the most severe error, leaving it to PB> the user to locate the errors. What is the most severe error? PB> c. Minimall UFunc class: Typo: Minimal? Regards, Rob Hooft -- ===== rob@hooft.net http://www.hooft.net/people/rob/ ===== ===== R&D, Nonius BV, Delft http://www.nonius.nl/ ===== ===== PGPid 0xFA19277D ========================== Use Linux! =========
![](https://secure.gravatar.com/avatar/c65e537418c534976be9316074c4bdb6.jpg?s=120&d=mm&r=g)
Rob W. W. Hooft writes:
Some random PEP talk.
"PB" == Paul Barrett <Barrett@stsci.edu> writes:
PB> Its relation to the other types is defined when the C-extension PB> module for that type is imported. The corresponding Python code PB> is:
>> Int32.astype[Real64] = Real64
I understand this is to be done by the Int32 C extension module. But how does it know about Real64?
This approach assumes that there are a basic set of predefined types. In the above example, the Real64 type is one of them. But let's consider creating a completely new type, say Real128. This type knows its relation to the other previously defined types, namely Real32, Real64, etc., but they do not know their relationship to it. That's still OK, because the Real128 type is imbued with this required information and is willing to share it with the other types. By way of bootstrapping, only one predefined type need be known, say, Int32. The operations associated with this type can only be Int32 operations, because this is the only type it knows about. Yet, we can add another type, say Real64, which has not only Real64 operations, BUT also Int32 and Real64 mixed operations, since it knows about Int32. The Real64 type provides the necessary information to relate the Int32 and Int64 types. Let's now add a third type, then a fourth, etc., each knowing about its predecessor types but not its successors. This approach is identical to the way core Python adds new classes or C-extension types, so this is nothing new. The current types do not know about the new type, but the new type knows about them. As long as one type knows the relationship between the two that is sufficient for the scheme to work.
PB> Attributes: PB> .name: e.g. "Int32", "Float64", etc. PB> .typecode: e.g. 'i', 'f', etc. PB> (for backward compatibility)
.typecode() is a method now.
Yes, I propose that it become a settable attribute.
PB> .size (in bytes): e.g. 4, 8, etc.
"element size?"
Yes.
add.register('add', (Int32, Int32, Int32), cfunc-add)
Typo: cfunc-add is an expression, not an identifier.
No, it is a Python object that encompasses and describes a C function that adds two Int32 arrays and returns an Int32 array. It is essentially a Python wrapper of a C-function UFunc. It has been suggested that you should also be able to register Python expressions using the same interface.
An implementation of a (Int32, Float32, Float32) add is possible and desirable as mentioned earlier in the document. Which C module is going to declare such a combination?
PB> asstring(): create string from array
Not "tostring" like now?
This is proposed so as to be a little more consistent with Core Python which uses 'from-' and 'as-' prefixes. But I'm don't have strong opinions either way.
PB> 4. ArrayView
PB> This class is similar to the Array class except that the reshape PB> and flat methods will raise exceptions, since non-contiguous PB> arrays cannot be reshaped or flattened using just pointer and PB> step-size information.
This was completely unclear to me until here. I must say I find this a strange way of handling things. I haven't looked into implementation details, but wouldn't it feel more natural if an Array would just be the "data", and an ArrayView would contain the dimensions and strides. Completely separated. One would always need a pair, but more than one ArrayView could use the same Array.
In my definition, an Array that has no knowledge of its shape and type is not an Array, it's a data or character buffer. An array in my definition is a data buffer with information on how that buffer is to be mapped, i.e. shape, type, etc. An ArrayView is an Array that shares its data buffer with another Array, but may contain a different mapping of that Array, ie. its shape and type are different. If this is what you mean, then the answer is "Yes". This is how we intend to implement Arrays and ArrayViews.
PB> a. _ufunc:
PB> 1. Does slicing syntax default to copy or view behavior?
Numeric 1 uses slicing for view, and a method for copy. "Feeling" compatible with core python would require copy on rhs, and view on lhs of an assignment. Is that distinction possible?
If copy is the default for slicing, how would one make a view?
B = A.V[:10] or A.view[:10] are some possibilities. B is now an ArrayView class.
PB> 2. Does item syntax default to copy or view behavior?
view.
PB> Yet, c[i] can be considered just a shorthand for c[i,:] which PB> would imply copy behavior assuming slicing syntax returns a copy.
If you reason that way, then c is just a shorthand for c[...] too.
Yes, that is correct, but that is not how Python currently behaves. The motivation for these questions is consistency with core Python behavior. The current Numeric does not follow this pattern for reasons of performance. If we assume performance is NOT an issue (ie. we can get similar performance by using various tricks), then what behavior is more intuitive for the average, and novice, user?
PB> 3. How is scalar coercion implemented?
PB> Python has fewer numeric types than Numeric which can cause PB> coercion problems. For example when multiplying a Python scalar PB> of type float and a Numeric array of type float, the Numeric array PB> is converted to a double, since the Python float type is actually PB> a double. This is often not the desired behavior, since the PB> Numeric array will be doubled in size which is likely to be PB> annoying, particularly for very large arrays.
Sure. That is handled reasonably well by the current Numeric 1.
To extend this, I'd like to comment that I have never really understood the philosophy of taking the largest type for coercion in all languages. Being a scientist, I have learned that when you multiply a very accurate number with a very approximate number, your result is going to be very approximate, not very accurate! It would thus be more logical to have Float32*Float64 return a Float32!
If numeric precision was all that mattered, then you would be correct. But numeric range is also important. I would hate to take the chance of overflowing the above multiplication because I stored the result as a Float32, instead of a Float64, even though the Float64 is overkill in terms of precision. FORTRAN has made an attempt to address this issue in FORTRAN 9X by allowing the user to indicate the range and precision of the calculation.
PB> In a future version of Python, the behavior of integer division PB> will change. The operands will be converted to floats, so the PB> result will be a float. If we implement the proposed scalar PB> coercion rules where arrays have precedence over Python scalars, PB> then dividing an array by an integer will return an integer array PB> and will not be consistent with a future version of Python which PB> would return an array of type double. Scientific programmers are PB> familiar with the distinction between integer and float-point PB> division, so should Numeric 2 continue with this behavior?
Numeric 2 should be as compatible as reasonably possible with core python. But my question is: how would we do integer division of arrays? A ufunc for which no operator shortcut exists?
I don't understand either question. We have devised a scheme where there are two sets of coercion rules. One for coercion between array types, and one for array and Python scalar types. This latter set of rules can either have higher precedence for array types or Python scalar types. We favor array types having precedence. A more complex set of coercion rules is also possible, if you prefer.
PB> 7. How are numerical errors handled (IEEE floating-point errors in PB> particular)?
I am developing my code on Linux and IRIX. I have seen that where Numeric code on Linux runs fine, the same code on IRIX may "core dump" on a FPE (e.g. arctan2(0,0)). That difference should be avoided.
PB> a. Print a message of the most severe error, leaving it to PB> the user to locate the errors.
What is the most severe error?
Well, divide by zero and overflow come to mind. Underflows are often considered less severe. Yet this is up to you to decide.
PB> c. Minimall UFunc class:
Typo: Minimal?
Got it! Thanks for your comments. I-obviously-have-a-lot-of-explaining-to-do-ly yours, Paul -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218
![](https://secure.gravatar.com/avatar/802b07155a8ab3996b825eca778f770e.jpg?s=120&d=mm&r=g)
"PB" == Paul Barrett <Barrett@stsci.edu> writes:
PB> By way of bootstrapping, only one predefined type need be known, PB> say, Int32. The operations associated with this type can only be PB> Int32 operations, because this is the only type it knows about. PB> Yet, we can add another type, say Real64, which has not only PB> Real64 operations, BUT also Int32 and Real64 mixed operations, PB> since it knows about Int32. The Real64 type provides the PB> necessary information to relate the Int32 and Int64 types. Let's PB> now add a third type, then a fourth, etc., each knowing about its PB> predecessor types but not its successors. PB> This approach is identical to the way core Python adds new PB> classes or C-extension types, so this is nothing new. The PB> current types do not know about the new type, but the new type PB> knows about them. As long as one type knows the relationship PB> between the two that is sufficient for the scheme to work. Yuck. I'm thinking how long it would take to load the Int256 class, because it will need to import all other types before defining the relations.... [see below for another idea] PB> Attributes: .name: e.g. "Int32", "Float64", etc. .typecode: PB> e.g. 'i', 'f', etc. (for backward compatibility)
.typecode() is a method now.
PB> Yes, I propose that it become a settable attribute. Then it is not backwards compatible anyway, and you could leave it out. PB> .size (in bytes): e.g. 4, 8, etc.
"element size?"
PB> Yes. I think it should be called like that in that case. I dnt lk abbrvs. size could be misread as the size of the total object.
add.register('add', (Int32, Int32, Int32), cfunc-add)
Typo: cfunc-add is an expression, not an identifier.
PB> No, it is a Python object that encompasses and describes a C PB> function that adds two Int32 arrays and returns an Int32 array. I understand that, but in general a "-" in pseudo-code is the minus operator. I'd write cfunc_add instead.
An implementation of a (Int32, Float32, Float32) add is possible and desirable as mentioned earlier in the document. Which C module is going to declare such a combination?
Now that I re-think this: would it be possible for the type-loader to check for each type that it loads whether a cross-type module is available with a previously loaded type? That way all types can be independent. There would be a Int32 module knowing only Int32 types, and Float32 only knowing Float32 types. Then there would be a Int32Float32 type that handles cross-type functions. When Int32 or Float32 is loaded, the loader can see whether the other has been loaded earlier, and if it is, load the cross-definitions as well. Only problem I can think of is functions linking 3 or more types. PB> asstring(): create string from array
Not "tostring" like now?
PB> This is proposed so as to be a little more consistent with Core PB> Python which uses 'from-' and 'as-' prefixes. But I'm don't have PB> strong opinions either way. PIL uses tostring as well. Anyway, I understand the buffer interface is a nicer way to communicate. PB> 4. ArrayView
PB> This class is similar to the Array class except that the reshape PB> and flat methods will raise exceptions, since non-contiguous PB> arrays cannot be reshaped or flattened using just pointer and PB> step-size information.
This was completely unclear to me until here. I must say I find this a strange way of handling things. I haven't looked into implementation details, but wouldn't it feel more natural if an Array would just be the "data", and an ArrayView would contain the dimensions and strides. Completely separated. One would always need a pair, but more than one ArrayView could use the same Array.
PB> In my definition, an Array that has no knowledge of its shape and PB> type is not an Array, it's a data or character buffer. An array PB> in my definition is a data buffer with information on how that PB> buffer is to be mapped, i.e. shape, type, etc. An ArrayView is PB> an Array that shares its data buffer with another Array, but may PB> contain a different mapping of that Array, ie. its shape and type PB> are different. PB> If this is what you mean, then the answer is "Yes". This is how PB> we intend to implement Arrays and ArrayViews. No, it is not what I meant. Reading your answer I'd say that I wouldn't see the need for an Array. We only need a data buffer and an ArrayView. If there are two parts of the functionality, it is much cleaner to make the cut in an orthogonal way. PB> B = A.V[:10] or A.view[:10] are some possibilities. B is now an PB> ArrayView class. I hate magic attributes like this. I do not like abbrevs at all. It is not at all obvious what A.T or A.V mean. PB> 2. Does item syntax default to copy or view behavior?
view.
PB> Yet, c[i] can be considered just a shorthand for c[i,:] which PB> would imply copy behavior assuming slicing syntax returns a copy.
If you reason that way, then c is just a shorthand for c[...] too.
PB> Yes, that is correct, but that is not how Python currently PB> behaves. Current python also doesn't treat c[i] as a shorthand for c[i,:] or c[i,...] Regards, Rob Hooft -- ===== rob@hooft.net http://www.hooft.net/people/rob/ ===== ===== R&D, Nonius BV, Delft http://www.nonius.nl/ ===== ===== PGPid 0xFA19277D ========================== Use Linux! =========
![](https://secure.gravatar.com/avatar/c65e537418c534976be9316074c4bdb6.jpg?s=120&d=mm&r=g)
Rob W. W. Hooft writes:
"PB" == Paul Barrett <Barrett@stsci.edu> writes:
PB> By way of bootstrapping, only one predefined type need be known, PB> say, Int32. The operations associated with this type can only be PB> Int32 operations, because this is the only type it knows about. PB> Yet, we can add another type, say Real64, which has not only PB> Real64 operations, BUT also Int32 and Real64 mixed operations, PB> since it knows about Int32. The Real64 type provides the PB> necessary information to relate the Int32 and Int64 types. Let's PB> now add a third type, then a fourth, etc., each knowing about its PB> predecessor types but not its successors.
PB> This approach is identical to the way core Python adds new PB> classes or C-extension types, so this is nothing new. The PB> current types do not know about the new type, but the new type PB> knows about them. As long as one type knows the relationship PB> between the two that is sufficient for the scheme to work.
Yuck. I'm thinking how long it would take to load the Int256 class, because it will need to import all other types before defining the relations.... [see below for another idea]
First, I'm not proposing that we use this method of bootstapping from just one type. I was just demonstrating that it could be done. Users could then create their own types and dynamically add them to the module by the above scheme. Second, I think your making the situation more complex than it really is. It doesn't take that long to initialize the type rules and register the functions, because both arrays are sparsely populated. If there isn't a rule between two types, you don't have to create a dictionary entry. The size of the coecion table is equal to or less than the number of types, so that's small. The function table is a sparsely populated square array. We just envision populating its diagonal elements and using coercion rules for the empty off-diagonal elements. The point is that if an off-diagonal element is filled, then it will be used. I'll include our proposed implementation in the PEP for clarification.
PB> Attributes: .name: e.g. "Int32", "Float64", etc. .typecode: PB> e.g. 'i', 'f', etc. (for backward compatibility)
.typecode() is a method now.
PB> Yes, I propose that it become a settable attribute.
Then it is not backwards compatible anyway, and you could leave it out.
I'd like to, but others have strongly objected to leaving out typecodes.
PB> .size (in bytes): e.g. 4, 8, etc.
"element size?"
PB> Yes.
I think it should be called like that in that case. I dnt lk abbrvs. size could be misread as the size of the total object.
How about item_size?
add.register('add', (Int32, Int32, Int32), cfunc-add)
Typo: cfunc-add is an expression, not an identifier.
PB> No, it is a Python object that encompasses and describes a C PB> function that adds two Int32 arrays and returns an Int32 array.
I understand that, but in general a "-" in pseudo-code is the minus operator. I'd write cfunc_add instead.
Yes. I understand now.
PB> 4. ArrayView
PB> This class is similar to the Array class except that the reshape PB> and flat methods will raise exceptions, since non-contiguous PB> arrays cannot be reshaped or flattened using just pointer and PB> step-size information.
This was completely unclear to me until here. I must say I find this a strange way of handling things. I haven't looked into implementation details, but wouldn't it feel more natural if an Array would just be the "data", and an ArrayView would contain the dimensions and strides. Completely separated. One would always need a pair, but more than one ArrayView could use the same Array.
PB> In my definition, an Array that has no knowledge of its shape and PB> type is not an Array, it's a data or character buffer. An array PB> in my definition is a data buffer with information on how that PB> buffer is to be mapped, i.e. shape, type, etc. An ArrayView is PB> an Array that shares its data buffer with another Array, but may PB> contain a different mapping of that Array, ie. its shape and type PB> are different.
PB> If this is what you mean, then the answer is "Yes". This is how PB> we intend to implement Arrays and ArrayViews.
No, it is not what I meant. Reading your answer I'd say that I wouldn't see the need for an Array. We only need a data buffer and an ArrayView. If there are two parts of the functionality, it is much cleaner to make the cut in an orthogonal way.
I just don't see what you are getting at here! What attributes does your Array have, if it doesn't have a shape or type? If Arrays only have view behavior; then Yes, there is no need for the ArrayView class. Whereas if Arrays have copy behavior, it might be a good idea to distinguish between an ordinary Array and a ArrayView. An alternative would be to have a view attribute.
PB> B = A.V[:10] or A.view[:10] are some possibilities. B is now an PB> ArrayView class.
I hate magic attributes like this. I do not like abbrevs at all. It is not at all obvious what A.T or A.V mean.
I'm not a fan of them either, but I'm looking for concensus on these issues.
PB> 2. Does item syntax default to copy or view behavior?
view.
PB> Yet, c[i] can be considered just a shorthand for c[i,:] which PB> would imply copy behavior assuming slicing syntax returns a copy.
If you reason that way, then c is just a shorthand for c[...] too.
PB> Yes, that is correct, but that is not how Python currently PB> behaves.
Current python also doesn't treat c[i] as a shorthand for c[i,:] or c[i,...]
Because there aren't any multi-dimensional lists in Python, only nested 1-dimensional lists. There is a structural difference. -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218
![](https://secure.gravatar.com/avatar/802b07155a8ab3996b825eca778f770e.jpg?s=120&d=mm&r=g)
"PB" == Paul Barrett <Barrett@stsci.edu> writes:
PB> .size (in bytes): e.g. 4, 8, etc.
"element size?"
PB> How about item_size? OK.
No, it is not what I meant. Reading your answer I'd say that I wouldn't see the need for an Array. We only need a data buffer and an ArrayView. If there are two parts of the functionality, it is much cleaner to make the cut in an orthogonal way.
PB> I just don't see what you are getting at here! What attributes PB> does your Array have, if it doesn't have a shape or type? A piece of memory. It needs nothing more. A buffer[1]. You'd always need an ArrayView. The Arrayview contains information like dimensions, strides, data type, endianness. Making a new _view_ would consist of making a new ArrayView, and pointing its data object to the same data array. Making a new _copy_ would consist of making a new ArrayView, and marking the "copy-on-write" features (however that needs to be implemented, I have never done that. Does it involve weak references?). Different Views on the same data can even have different data types: e.g. character and byte, or even floating point and integer (I am a happy user of the fortran EQUIVALENCE statement that way too). The speed up by re-use of temporary arrays becomes very easy this way too: one can even re-use a floating point data array as integer result if the reference count of both the data array and its (only) view is one. [1] Could the python buffer interface be used as a pre-existing implementation here? Would that make it possible to implement Array.append()? I don't always know beforehand how large my numeric arrays will become. Rob -- ===== rob@hooft.net http://www.hooft.net/people/rob/ ===== ===== R&D, Nonius BV, Delft http://www.nonius.nl/ ===== ===== PGPid 0xFA19277D ========================== Use Linux! =========
![](https://secure.gravatar.com/avatar/c65e537418c534976be9316074c4bdb6.jpg?s=120&d=mm&r=g)
Rob W. W. Hooft writes:
No, it is not what I meant. Reading your answer I'd say that I wouldn't see the need for an Array. We only need a data buffer and an ArrayView. If there are two parts of the functionality, it is much cleaner to make the cut in an orthogonal way.
PB> I just don't see what you are getting at here! What attributes PB> does your Array have, if it doesn't have a shape or type?
A piece of memory. It needs nothing more. A buffer[1]. You'd always need an ArrayView. The Arrayview contains information like dimensions, strides, data type, endianness.
Making a new _view_ would consist of making a new ArrayView, and pointing its data object to the same data array.
Making a new _copy_ would consist of making a new ArrayView, and marking the "copy-on-write" features (however that needs to be implemented, I have never done that. Does it involve weak references?).
Different Views on the same data can even have different data types: e.g. character and byte, or even floating point and integer (I am a happy user of the fortran EQUIVALENCE statement that way too).
I think our approaches are very similar. It's the meaning that we ascribe to Array and ArrayView that appears to be causing the confusion. Your Array object is our Data object and your ArrayView object is our Array attributes, ie. the information to map/interpret the Data object. We view an Array as being composed of two entities, its attributes and a Data object. And we entirely agree with the above definitions of _view_ and _copy_. But you haven't told us what object associates your Array and ArrayView to make a usable array that can be sliced, diced, and Julian fried. My impression of your slice method would be: slice(Array, ArrayView, slice expression) I'm not too keen on this approach. :-)
The speed up by re-use of temporary arrays becomes very easy this way too: one can even re-use a floating point data array as integer result if the reference count of both the data array and its (only) view is one.
Yes! This is our intended implementation. But instead of re-using your Array object, we will be re-using a (data-) buffer object, or a memory-mapped object, or whatever else in which the data is stored.
[1] Could the python buffer interface be used as a pre-existing implementation here? Would that make it possible to implement Array.append()? I don't always know beforehand how large my numeric arrays will become.
In a way, yes. I've considered creating an in-memory object that has similar properties to the memory-mapped object (e.g. it might have a read-only property), so that the two data objects can be used interchangeably. The in-memory object would replace the string object as a data store, since the string object is meant to be read-only. -- Paul -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218
![](https://secure.gravatar.com/avatar/802b07155a8ab3996b825eca778f770e.jpg?s=120&d=mm&r=g)
"PB" == Paul Barrett <Barrett@stsci.edu> writes:
PB> we entirely agree with the above definitions of _view_ and PB> _copy_. But you haven't told us what object associates your PB> Array and ArrayView to make a usable array that can be sliced, PB> diced, and Julian fried. Hm. You know, I am not so deep into the python internals. I am a fairly high-level programmer. Not a CS type, but a chemist.... There might be much of implementation detail that escapes me. But I'm just trying to keep things beautiful (as in Erich Gamma et.al.) I thought an ArrayView would have a pointer to the data array. Like the either like the .data attribute in the Numeric 1 API, or as a python object pointer. PB> My impression of your slice method would be: PB> slice(Array, ArrayView, slice expression) If ArrayView.HasA(Array), that would not be required. Regards, Rob Hooft. -- ===== rob@hooft.net http://www.hooft.net/people/rob/ ===== ===== R&D, Nonius BV, Delft http://www.nonius.nl/ ===== ===== PGPid 0xFA19277D ========================== Use Linux! =========
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Being a scientist, I have learned that when you multiply a very accurate number with a very approximate number, your result is going to be very approximate, not very accurate! It would thus be more logical to have Float32*Float64 return a Float32!
Accuracy is not the right concept, but storage capacity. A Float64 can store any value that can be stored in a Float32, but the inverse is not true. Accuracy is not a property of a number, but of a value and its representation in the computer. The float value "1." can be perfectly accurate, even in 32 bits, or it can be an approximation for 1.-1.e-50, which cannot be represented precisely. BTW, Float64 also has a larger range of magnitudes than Float32, not just more significant digits.
Numeric 2 should be as compatible as reasonably possible with core python. But my question is: how would we do integer division of arrays? A ufunc for which no operator shortcut exists?
Sounds fine. On the other hand, if and when Python's integer division behaviour is changed, there will be some new syntax for integer division, which should then also work on arrays. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Design and Implementation
Some parts of this look a bit imprecise and I don't claim to understand them. For example:
Its relation to the other types is defined when the C-extension module for that type is imported. The corresponding Python code is:
> Int32.astype[Real64] = Real64
This says that the Real64 array-type has higher priority than the Int32 array-type.
I'd choose a clearer name than "astype" for this, but that's a minor detail. More important is how this is supposed to work. Suppose that in Int32 you say that Real64 has higher priority, and in Real64 you say that Int32 has higher priority. Would this raise an exception, and if so, when? Perhaps the coercion question should be treated in a separate PEP that also covers standard Python types and provides a mechanism that any type implementer can use. I could think of a number of cases where I have wished I could define coercions between my own and some other types properly.
3. Array:
This class contains information about the array, such as shape, type, endian-ness of the data, etc.. Its operators, '+', '-',
What about the data itself?
4. ArrayView
This class is similar to the Array class except that the reshape and flat methods will raise exceptions, since non-contiguous
There are no reshape and flat methods in this proposal...
1. Does slicing syntax default to copy or view behavior?
The default behavior of Python is to return a copy of a sub-list or tuple when slicing syntax is used, whereas Numeric 1 returns a view into the array. The choice made for Numeric 1 is apparently for reasons of performance: the developers wish to avoid the
Yes, performance was the main reason. But there is another one: if slicing returns a view, you can make a copy based on it, but if slicing returns a copy, there's no way to make a view. So if you change this, you must provide some other way to generate a view, and please keep the syntax simple (there are many practical cases where a view is required).
In this case the performance penalty associated with copy behavior can be minimized by implementing copy-on-write. This scheme has
Indeed, that's what most APL implementations do.
data buffer is made. View behavior would then be implemented by an ArrayView class, whose behavior be similar to Numeric 1 arrays,
So users would have to write something like ArrayView(array, indices) That looks a bit cumbersome, and any straightforward way to write the indices is illegal according to the current syntax rules.
2. Does item syntax default to copy or view behavior?
If compatibility with lists is a criterion at all, then I'd apply it consistently and use view semantics. Otherwise let's forget about lists and discuss 1. and 2. from a purely array-oriented point of view. And then I'd argue that view semantics is more frequent and should thus be the default for both slicing and item extraction.
3. How is scalar coercion implemented?
The old discussion again...
annoying, particularly for very large arrays. We prefer that the array type trumps the python type for the same type class, namely
That is a completely arbitrary rule from any but the "large array performance" point of view. And it's against the Principle of Least Surprise. Now that we have the PEP procedure for proposing any change whatsoever, why not lobby for the addition of a float scalar type to Python, with its own syntax for constants? That looks like the best solution from everybody's point of view.
4. How is integer division handled?
In a future version of Python, the behavior of integer division will change. The operands will be converted to floats, so the
Has that been decided already?
7. How are numerical errors handled (IEEE floating-point errors in particular)?
It is not clear to the proposers (Paul Barrett and Travis Oliphant) what is the best or preferred way of handling errors. Since most of the C functions that do the operation, iterate over the inner-most (last) dimension of the array. This dimension could contain a thousand or more items having one or more errors of differing type, such as divide-by-zero, underflow, and overflow. Additionally, keeping track of these errors may come at the expense of performance. Therefore, we suggest several options:
I'd like to add another one: e. Keep some statistics about the errors that occur during the operation, and if at the end the error count is > 0, raise an exception containing as much useful information as possible. I would certainly not want any Python program to *print* anything unless I have explicitly told it to do so. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/8663ddbf163fccf1dd165e1359af3b63.jpg?s=120&d=mm&r=g)
Is there any plan to support sparse matrices in NumPy? Peter
participants (4)
-
Daehyok Shin
-
Konrad Hinsen
-
Paul Barrett
-
rob@hooft.net