We now turn to the behavior of Records. We'll note that many of the current proposals had been considered in the past but not implemented with more of a 'wait and see' attitude towards what was really necessary and a desire to prevent too many ways of doing the same thing without seeing that there was a real call for them. This proposal deals with the behavior of record array 'items', i.e., what we call Record objects now. The primary issues that have been raised with regard to Record behavior are summarized as follows: 1) Items should be tuples instead of Records 2) Items should be objects, but present tuple and/or dictionary consistent behavior. 3) Field (or column) names should be accessible as Record (and record array) attributes. Issue 1: Should record array items be tuples instead of Records? Francesc Alted made this suggestion recently. Essentially the argument is that tuples are a natural way of representing records. Unfortunately, tuples do not provide a means of accessing fields of a record by name, but only by number. For this reason alone, tuples don't appear to be adequate. Francesc proposed allowing dictionary-like indexing to record arrays to facilitate the field access to tuple entries by name. However, it seems that if rarr is a record array, that both rarr['column 1'][2] and rarr[2]['column 1'] should work, not just the former. So the short answer is "No". It should be noted that using tuples will force another change in current behavior. Note that the current Record objects are actually views into the record array. Changing the value within a record object changes the record array. Use of tuples won't allow that since tuples are not mutable. Whole records must be changed in their entirety if single elements of record arrays were set by and returned from tuples. But his comments (and well as those of others) do point out a number of problems with the current implementation that could be improved, and making the Record object support tuple behaviors is quite reasonable. Hence: Issue 2: Should record array items present tuple and/or dictionary compatible behaviors? The short answer is, yes, we do agree that they should. This includes many of the proposals made including: 1) supporting all Tuple capabilities with the following differences: a) fields are mutable (unlike tuple items) so long as the assigned value is coerceable to the expected type. For example the current methods of doing so are:
cell = oneRec.field(1) oneRec.setfield(1, newValue)
This proposal would allow:
cell = oneRec[1] oneRec[1] = newValue
b) slice assignments are permitted so long as it doesn't change the size of the record (i.e., no insertion of extra items) and the items can be assigned as permitted for a. E.g., OneCell[2:4] = (3, 'abc') c) __str__ will result in a display looking like that for tuples, __repr__ will show a Record constructor
print oneRec # as is currently implemented (1.1, 2, 'abc', 3) oneRec Record((1.1, 2, 'abc', 3), formats=['1Float32', '1Int16', '1a3', '1Int32']) names=['abc', 'c2', 'xyz', 'c4'])
(note that how best to handle formats is still being thought about) 2) supporting all Dictionary capabilities with the following differences: a) keys and items are ordered. b) keys are restricted to being integers or strings only c) new keys cannot be dynamically added or deleted as for dictionaries d) no support for any other dictionary capabilities that can change the number or names of items e) __str__ will not show a result looking like a dictionary (see 1c) f) values must meet Record object required type (or be coerceable to it) For example the current
cell = onRec.field('c2') oneRec.setfield('c2', newValue)
And the proposed added indexing capability:
cell = oneRec['c2'] oneRec['c2'] = newValue
Issue 3: Field (or column) names should be accessible as Record (and record array) attributes. As much as the attribute approach has appeal for simple usage, the problems of name collisions and mismatches between acceptable field names and attribute names strikes us as it does Russell Owen as being very problematic. The technique of using a special attribute as Francesc suggests (in his case, cols) that contains the field name attributes solves the name collision problem, but not the legality issue (particularly with regard to illegal characters, it's hard to imagine easily remembered mappings between legal attribute representations and the actual field name. We are inclined to try to pass (for now anyway) on mapping fields to attributes in any way. It seems to us that indexing by name should be convenient enough, as well as fully flexible to really satisfy all needs (and is needed in any case since attributes are a clumsy way to use field access when using a variable to specify the field (yes, one can use getattr(), but it's clumsy) ******************************************* Record array behavior changes: 1) It will be possible to assign any sequence to a record array item so long as the sequence contains the right number of fields, and each item of the sequence can be coerced to what the record array expects for the corresponding field of the record. (addressing numarray feature request 928473 by Russell Owen). I.e.,
recArr[1] = (2, 3.2, 'xyz', 3)
2) One may assign a record to a record array so long as the record matches the format of the record format of the record array (current behavior). 3) Easier construction and initialization of recarrays with default field values as requested in numarray bug report 928479) 4) Support for lists of field names and formats as detailed in numarray bug report 928488. 5) Field name indexing for record arrays. It will be possible to index record arrays with a field name, i.e., if the index is a string, then what will be returned is a numarray/chararray for that column. (Note that it won't be possible to index record arrays by field number for obvious reasons). I.e. Currently
col = recArr.field('doc')
Can also be
col = recArr['abc']
But the current
col = recArr.field(1)
Cannot become
col = recArr[1]
On the other hand, it will not be permitted to mix a field index with an array index in the same brackets, e.g., rarr[10, 'column 2'] will not be supported. Allowing indexing to have two different interpretations is a bit worrying. But if record array items may be indexed in this manner, it seems natural to permit the same indexing for the record array. Mixing the two kinds of indexing in one index seems of limited usefulness in the first place and it makes inheriting the existing indexing machinery for NDArrays more complicated (any efficiency gains in avoiding the intermediate object creation by using two separate index operations will likely be offset by the slowness of handling much more complicated mixed indices). Perhaps someone can argue for why mixing field indices with array indices is important, but for now we will prohibit this mode of indexing. This does point to a possible enhancement for the field indexing, namely being able to provide the equivalent of index arrays (e.g., a list of field names) to generate a new record array with a subset of fields. Are there any other issues that should be addressed for improving record arrays?