Francesc Alted wrote:
Yeah. We just differ in the way to arrange this metadata to be passed to the recarray constructor. But I think this is secondary compared to the flexibility that a verbose approach offers compared with the actual string format.
Yes. So one question is: if we were to add type-repetition tuples to recarray as an alternative to the current character code strings, would that be any form of improvement to recarray from your perspective? As I see it, recarray currently has a clean seperation between format and naming which permits the latter to be optional. Before changing that, I'd need a clear argument why. (I didn't design and generally don't even maintain recarray).
In fact, more than one container might be supported to define the metadata; one can start with tuples as you suggest, but in the future other ways can be added (if considered convenient).
For example, I think I'll stick with the dictionary option for PyTables, but also a class declaration for the metadata would be supported, like in :
class Small(IsRecord): var1 = defineType(CharType, 2, "") var2 = defineType(Int32, 1) var3 = Float64
This would not be difficult to support because, by accessing to the Small().__dict__, you get also a dictionary. In addition, the latter will ensure (by construction) that you are not using a non-valid python identifier, which is mandatory in my current implementation. I find these containers (dictionaries and classes) both elegant and convenient.
I'm not trying to be Mr. Negative here, but one thing to keep in mind is this:
class C: ... pass ... c = C() dir(c.__dict__) ['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__repr__', '__setattr__', '__setitem__', '__str__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
Which is to say, the instance dictionary is a little cluttered, and it might not be that easy to determine which objects in it are there to define the data format.
Just like the type repetition tuple except also including field names and default values. I don't think you lost me. For what we do, the exact physical layout of the "struct" is important, so order matters. I see order as part of the meta-data, but I don't usually deal with meta-entities so maybe I've got that part wrong. :)
Well, if you need positional fields, you may add a (optional) parameter, called for example, "position" so that you can fix it.
I'm sure that's not the easiest way to capture struct layout, but I take your point. Since position matters to me, I'd prefer that capturing them was implicit. Since it doesn't to you, it seems OK for it to be explicit. Either default mode can support the other, but capturing order with tuples is free, while capturing order with a __dict__ will take some kind of extra work.
I was thinking that if the above was an issue, we could write an API function(s) to "compile" the type-repetition tuple into arrays of ints which describe the type of each field and corresponding repetition factor.
Yeah, I agree that this would be the best solution. That way, the charcodes will be factored out from the code, and by just providing such and API (both in Python and C), would be enough to reconstruct them, if needed. That will allow a more consistent numarray internal code.
I'm thinking the general format for this may be converting N-tuples of types and ints into N arrays of types and ints. And vice versa. It's obvious how this works with numarray types. I think the chararray types need work and need to be mapped into the same integer enumeration as the numeric types in a non-overlapping way.
See you Monday,
Right, how did you know that? :)
Insightful on weekends anyway, Todd