FW: [Numpy-discussion] typecodes in numarray
Todd Miller had some further comments that I thought were worth posting as well (and I think he makes some very good points). ************************************************************************ My [i.e. Todd's] thoughts about it:
Maybe I'm becoming a bit tedious with this, but if you look at:
No. It shows you're thinking about it carefully. Having looked at all of the examples below, I have some comments: 1. The sparseness and obscurity of the typecode "wordspace" are both demonstrated here. There are so few letters to choose from, they're often already used in some other context. Even given the large number of unused letters, it's often difficult to choose good ones and to remember what has been chosen. I think this is one of the reasons Perry chose to replace typecodes with true type objects which have rich, regular, and predictable symbolic names. 2. Typecodes were added as a backwards compatability feature of numarray, and I think it's probable that numarray beat Numeric to supporting most of these types, because otherwise they'd have been copied directly and there would be no problem. I'm not really trying to play a blame-game here, but I am making an argument that perhaps numarray should only go so far in the support of what I regard as an obsolescent feature. If the Numeric developers choose to continue extending the use of typecodes in ways that are incompatible with numarray, one way of dealing with it is to "just say no". We are going beyond the scope of backwards compatability to on-going compatabilty. (Which we may still have to do but needs to be discussed and considered) 3. STSCI has layered other software on top of numarray and recarray which astronomers use to do work. It is the friction of that interface which makes correcting these consistency problems more difficult than might be immediately apparent.
I think it's important to agree with a definitive set of charcodes and use them uniformly throughout numarray.
I wish this were possible, but I'm thinking we should try to find an alternative approach altogether, one which may be more verbose but implicitly free of conflict. A means for specifying a recarray format might be created from tuples, type objects, and integer repetition factors. The verbosity of this approach might be a litte tedious, but it would also be transparent, maintainable, and conflict free. I think we should add an "obsolescent feature" warning to numarray and recarray which flags any use of character typecodes when the appropriate command line switches are set.
Suggestion: if recarray charcodes are not necessary to match the Numeric ones, I propose that using the Python convention maybe a good idea. Look at the table in: http://www.python.org/doc/current/lib/module-struct.html.
This sounds good to me, except that it will break an existing interface that I don't have control over. Therefore, I suggest we correct the problem by coming up with something better.
I don't understand this remark: <snip >but I am making an argument that perhaps
numarray should only go so far in the support of what I regard as an obsolescent feature. If the Numeric developers choose to continue extending the use of typecodes in ways that are incompatible with numarray, one way of dealing with it is to "just say no". We are going beyond the scope of backwards compatability to on-going compatabilty. (Which we may still have to do but needs to be discussed and considered)
There is no "on-going" Numeric development. It stops the minute numarray is ready. Period. We developers all agreed on that. The whole reason for numarray is that Numeric was pronounced unmaintainable and unextendable by those who frequently had to work on it. To do anything else will fragment the entire numerical python community and software set.
A Divendres 24 Gener 2003 18:02, Todd Miller va escriure:
My [i.e. Todd's] thoughts about it:
No. It shows you're thinking about it carefully. Having looked at all of the examples below, I have some comments:
I mostly agree with your comments, but let point out some thoughts
1. The sparseness and obscurity of the typecode "wordspace" are both demonstrated here. There are so few letters to choose from, they're often already used in some other context. Even given the large number of unused letters, it's often difficult to choose good ones and to remember what has been chosen. I think this is one of the reasons Perry chose to replace typecodes with true type objects which have rich, regular, and predictable symbolic names.
I completely agree that type objects is a brilliant idea.
3. STSCI has layered other software on top of numarray and recarray which astronomers use to do work. It is the friction of that interface which makes correcting these consistency problems more difficult than might be immediately apparent.
Yeah, I know...
I think it's important to agree with a definitive set of charcodes and use them uniformly throughout numarray.
I wish this were possible, but I'm thinking we should try to find an alternative approach altogether, one which may be more verbose but implicitly free of conflict.
A means for specifying a recarray format might be created from tuples, type objects, and integer repetition factors.
The verbosity of this approach might be a litte tedious, but it would also be transparent, maintainable, and conflict free.
I think this is a very good idea. In fact, while working in PyTables I was lately pondering what would be the best way to define record arrays, and I also think that a verbose approach should be the beast. After considering metaclasses, and tuples, I ended to a compromise solution between both which are dictionaries combined with some function or class to refine the definition. My current thinking is something like: recarrDescr = { "name" : defineType(CharType, 16, ""), # 16-character String "TDCcount" : defineType(UInt8, 1, 0), # unsigned byte "ADCcount" : defineType(Int16, 1, 0), # signed short integer "grid_i" : defineType(Int32, 1, 9), # integer "grid_j" : defineType(Int32, 1, 9), # integer "pressure" : defineType(Float32, 1, 1.), # float (single-precision) "temperature" : defineType(Float64, 32, arange(32)), # double[32] "idnumber" : defineType(Int64, 1, 0), # signed long long } where defineType is a class that accepts (type, shape, default) parameters. It can be extended safely in the future if more needs appear. Dictionary has the advantage over tuple in that you can map column name to their contents quite easily, and is more flexible than defining the fields with a metaclass descendent (see http://pytables.sourceforge.net/html-doc/usersguide-html3.html#subsection3.1...) because dictionarys can be built-up in run-time (although that also migth metaclass descendents, but in a more misterious way that I think is not worth of). In addition, dictionary object is available in all python version whereas metaclasses only from 2.2 on. However, I regard metaclasses as the most elegant solution (but elegance is not always equivalent to convenience :(). Perhaps you may want to consider this for using in recarray definition.
I think we should add an "obsolescent feature" warning to numarray and recarray which flags any use of character typecodes when the appropriate command line switches are set.
Well, I don't fully agree with that. I do believe that classes typecodes to be a more meaningful way for describing types, but charcodes can be quite advantageous in certain situations, like in describing in compact way the contents of a record, or passing this info to C-routines to deal with the data. For example, consider the benefits of describing a recarray format as: "3s4i20d" instead of ((Int16, 3), (Int32, 4), (Float64, 20), ) the former being more handy in lots of situations. I certainly believe that a coexistence of both can be very beneficious, specially for 3rd party extension makers (like me :).
Suggestion: if recarray charcodes are not necessary to match the Numeric ones, I propose that using the Python convention maybe a good idea. Look at the table in: http://www.python.org/doc/current/lib/module-struct.html.
This sounds good to me, except that it will break an existing interface that I don't have control over. Therefore, I suggest we correct the problem by coming up with something better.
Well, if charcodes finally stay in, this have an additional advantage in that python crew has provided meaningful ways to express padding (character "x"), endianess ("=", "<", ">") and alignment ("@"). So having a compact expresion like "@3sx4i20d", apart from resembling chinese to occidentals, may give a lot of info in a handy way. -- Francesc Alted
I think Todd was referring to the recent addition of unsigned types to Numeric, along with came new typecodes. These types were already in numarray at the time. Perry
-----Original Message----- From: Paul F Dubois [mailto:paul@pfdubois.com] Sent: Friday, January 24, 2003 12:42 PM To: 'Perry Greenfield'; falted@openlc.org; numpy-discussion@lists.sourceforge.net Subject: RE: [Numpy-discussion] typecodes in numarray
I don't understand this remark:
<snip >but I am making an argument that perhaps
numarray should only go so far in the support of what I regard as an obsolescent feature. If the Numeric developers choose to continue extending the use of typecodes in ways that are incompatible with numarray, one way of dealing with it is to "just say no". We are going beyond the scope of backwards compatability to on-going compatabilty. (Which we may still have to do but needs to be discussed and considered)
There is no "on-going" Numeric development. It stops the minute numarray is ready. Period. We developers all agreed on that. The whole reason for numarray is that Numeric was pronounced unmaintainable and unextendable by those who frequently had to work on it. To do anything else will fragment the entire numerical python community and software set.
A means for specifying a recarray format might be created from tuples, type objects, and integer repetition factors.
The verbosity of this approach might be a litte tedious, but it would also be transparent, maintainable, and conflict free.
I think this is a very good idea. In fact, while working in PyTables I was lately pondering what would be the best way to define record arrays, and I also think that a verbose approach should be the beast.
After considering metaclasses, and tuples, I ended to a compromise solution between both which are dictionaries combined with some function or class to refine the definition.
My current thinking is something like:
recarrDescr = { "name" : defineType(CharType, 16, ""), # 16-character String "TDCcount" : defineType(UInt8, 1, 0), # unsigned byte "ADCcount" : defineType(Int16, 1, 0), # signed short integer "grid_i" : defineType(Int32, 1, 9), # integer "grid_j" : defineType(Int32, 1, 9), # integer "pressure" : defineType(Float32, 1, 1.), # float (single-precision) "temperature" : defineType(Float64, 32, arange(32)), # double[32] "idnumber" : defineType(Int64, 1, 0), # signed long long }
where defineType is a class that accepts (type, shape, default) parameters. It can be extended safely in the future if more needs appear.
You're way ahead of me here. The only thing I don't like about this is the additional relative complexity because of the addition of field names and default values. It would be nice to layer this more.
Perhaps you may want to consider this for using in recarray definition.
We'll definitely consider it as we hash this out.
I think we should add an "obsolescent feature" warning to numarray and recarray which flags any use of character typecodes when the appropriate command line switches are set.
Well, I don't fully agree with that. I do believe that classes typecodes to be a more meaningful way for describing types, but charcodes can be quite advantageous in certain situations, like in describing in compact way the contents of a record, or passing this info to C-routines to deal with the data.
Yeah, I know.
For example, consider the benefits of describing a recarray format as:
"3s4i20d"
I know.
instead of
((Int16, 3), (Int32, 4), (Float64, 20), )
This is pretty much exactly what I was thinking. It is straightforward to imagine and difficult to forget.
the former being more handy in lots of situations.
Would you please name some of these so we can explore handling them both ways?
I certainly believe that a coexistence of both can be very beneficious, specially for 3rd party extension makers (like me :).
If there's a reasonable way to avoid supporting both, we should.
Suggestion: if recarray charcodes are not necessary to match the Numeric ones, I propose that using the Python convention maybe a good idea. Look at the table in: http://www.python.org/doc/current/lib/module-struct.html.
This sounds good to me, except that it will break an existing interface that I don't have control over. Therefore, I suggest we correct the problem by coming up with something better.
Well, if charcodes finally stay in, this have an additional advantage in that python crew has provided meaningful ways to express padding (character "x"), endianess ("=", "<", ">") and alignment ("@").
We might also add these to the type-repetition tuple. Regards, Todd
My current thinking is something like:
recarrDescr = { "name" : defineType(CharType, 16, ""), # 16-character String "TDCcount" : defineType(UInt8, 1, 0), # unsigned byte "ADCcount" : defineType(Int16, 1, 0), # signed short integer "grid_i" : defineType(Int32, 1, 9), # integer "grid_j" : defineType(Int32, 1, 9), # integer "pressure" : defineType(Float32, 1, 1.), # float (single-precision) "temperature" : defineType(Float64, 32, arange(32)), # double[32] "idnumber" : defineType(Int64, 1, 0), # signed long long }
where defineType is a class that accepts (type, shape, default) parameters. It can be extended safely in the future if more needs appear.
You're way ahead of me here. The only thing I don't like about this is the additional relative complexity because of the addition of field names and default values. It would be nice to layer this more.
One more thing I don't understand looking at this: a dictionary is unordered. Todd
Paul F Dubois wrote:
I don't understand this remark:
<snip >but I am making an argument that perhaps
numarray should only go so far in the support of what I regard as an obsolescent feature. If the Numeric developers choose to continue extending the use of typecodes in ways that are incompatible with numarray, one way of dealing with it is to "just say no". We are going beyond the scope of backwards compatability to on-going compatabilty. (Which we may still have to do but needs to be discussed and considered)
There is no "on-going" Numeric development. It stops the minute numarray is ready. Period. We developers all agreed on that. The whole reason for numarray is that Numeric was pronounced unmaintainable and unextendable by those who frequently had to work on it. To do anything else will fragment the entire numerical python community and software set.
Thanks for clarifying Paul. My point didn't quite come out right. A better way to put it might have been: 1. Numarray and Numeric are subject to accidental divergence. As long as they both continue to change concurrently, they will probably differ even in interface. Because numarray isn't quite ready yet, they are both still changing. 2. Typecodes in particular are something numarray is superceding with something better. Because of this, providing on-going compatability with Numeric typecodes may not make sense. 3. Numeric compatability is not the only driver for the choice of recarray typecodes so I can't make arbitrary changes without affecting other software and people. 4. I think there's a clearer, numarray type object based approach to describing recarray formats which does not use typecodes at all. Thus, instead of attampting to weed through and unify layers of conflicting type codes, we might be able to end-run the whole problem with an alternative approach. Todd
------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
--Boundary_(ID_V53Q9uhCvVN46XJvLKOLLw)--
A Divendres 24 Gener 2003 21:15, Todd Miller va escriure:
My current thinking is something like:
recarrDescr = { "name" : defineType(CharType, 16, ""), # 16-character String "TDCcount" : defineType(UInt8, 1, 0), # unsigned byte "ADCcount" : defineType(Int16, 1, 0), # signed short integer "grid_i" : defineType(Int32, 1, 9), # integer "grid_j" : defineType(Int32, 1, 9), # integer "pressure" : defineType(Float32, 1, 1.), # float (single-precision) "temperature" : defineType(Float64, 32, arange(32)), # double[32] "idnumber" : defineType(Int64, 1, 0), # signed long long }
where defineType is a class that accepts (type, shape, default) parameters. It can be extended safely in the future if more needs appear.
You're way ahead of me here. The only thing I don't like about this is the additional relative complexity because of the addition of field names and default values. It would be nice to layer this more.
Well, I think a map between field names and values is valuable from the user's point of view. It may help him to label the different information on the recarray. Moreover, if __getattr__ and __setattr__ methods (or __getitem__ and __setitem__) would get implemented on recarray (as they are in my recarray2 version, for example), the field name can become a very convenient manner to access a specific field by name (this introduce the limitation that field name must be a valid python identifier, but I think this is not a big restriction). By looking at the description dictionary, the user can have a quick idea of what he can find in every field (with no need of counting, which can be a big advantage specially for long records). With regard to default values, you can make this parameter (even the shape) a keyword parameter in order to make it optional. In that way, the definition can be as simple as "defineType(CharType)" (or even just "Chartype", if you add a bit of code) or as complete as "defineType(Chartype, shape, default, whatever_you_want)". I think this is a quite flexible approach.
One more thing I don't understand looking at this: a dictionary is unordered.
Yeah, but this can be regarded as an advantage rather than a drawback in the sense that you can choose the order you (the developer) prefer. For example, I was using first a alphanumerical order to arrange the data fields, but now, I'm considering that a arrangement that optimizes the alignment of the fields could be far better. As for one, say that you have a (Int8, Int32, Float64) record; in principle it could be easy to create a routine that arranges this record in the form (Float64,Int32, Int8) that optimizes the different field access (it may be even possible to introduce automatic padding later on if recarrays would support them in the future). Maybe you are getting confused in thinking that recarrDescr will create the recarray. Not at all, this a *metadata* definition that can be passed to the actual recarray funtion for recarray creation. Its function would be similar to the formats parameter (with typical values like "3a,4i,3w") in recarray.array, but with more verbosity and all the reported advantages.
instead of
((Int16, 3), (Int32, 4), (Float64, 20), )
This is pretty much exactly what I was thinking. It is straightforward to imagine and difficult to forget.
the former being more handy in lots of situations.
Would you please name some of these so we can explore handling them both ways?
Well, I'm afraid that the best advantage would be when dealing with recarrays in C extension modules. In this kind of situation it would be far better to deal with a "3a4i3w" array than a tuple of python objects. But maybe I'm wrong and the latter is not so-complicated to manage; however, I used to work a lot with records (even before meeting recarray) and I was quite comfortable with formats in string mode. Or perhaps it would be enough to provide a method for converting from the standard metadata layout (dictionary or tuple or whatever), to a string format. This should be not very difficult.
Well, if charcodes finally stay in, this have an additional advantage in that python crew has provided meaningful ways to express padding (character "x"), endianess ("=", "<", ">") and alignment ("@").
We might also add these to the type-repetition tuple.
It would be nice, of course. -- Francesc Alted
Francesc Alted wrote:
A Divendres 24 Gener 2003 21:15, Todd Miller va escriure:
My current thinking is something like:
recarrDescr = { "name" : defineType(CharType, 16, ""), # 16-character String "TDCcount" : defineType(UInt8, 1, 0), # unsigned byte "ADCcount" : defineType(Int16, 1, 0), # signed short integer "grid_i" : defineType(Int32, 1, 9), # integer "grid_j" : defineType(Int32, 1, 9), # integer "pressure" : defineType(Float32, 1, 1.), # float (single-precision) "temperature" : defineType(Float64, 32, arange(32)), # double[32] "idnumber" : defineType(Int64, 1, 0), # signed long long }
Still think I'd prefer something seperable:
recarrStruct = ( (CharType, 16), UInt8, Int16, Int32, Int32, Float32, (Float64, 32), Int64 ) recarrFields = ["name", "TDCcount", "ADCcount", "grid_i", "grid_j", "pressure", "temperature", "idnumber"] I guess it might not be quite as good for large structs.
where defineType is a class that accepts (type, shape, default) parameters. It can be extended safely in the future if more needs appear.
You're way ahead of me here. The only thing I don't like about this is the additional relative complexity because of the addition of field names and default values. It would be nice to layer this more.
Well, I think a map between field names and values is valuable from the user's point of view. It may help him to label the different information on the recarray. Moreover, if __getattr__ and __setattr__ methods (or __getitem__ and __setitem__) would get implemented on recarray (as they are in my recarray2 version, for example), the field name can become a very convenient manner to access a specific field by name (this introduce the limitation that field name must be a valid python identifier, but I think this is not a big restriction). By looking at the description dictionary, the user can have a quick idea of what he can find in every field (with no need of counting, which can be a big advantage specially for long records).
That's true and sounds nice. I'm just thinking records with named fields should be derived from records with positional fields. If the functionality is layered, you can use as much complexity as you need. It's a good sign that both you and I thought of an identical tuple format; it's the obvious minimal one.
With regard to default values, you can make this parameter (even the shape) a keyword parameter in order to make it optional.
OK. That's a good point.
One more thing I don't understand looking at this: a dictionary is unordered.
Yeah, but this can be regarded as an advantage rather than a drawback in the sense that you can choose the order you (the developer) prefer. For example, I was using first a alphanumerical order to arrange the data fields, but now, I'm considering that a arrangement that optimizes the alignment of the fields could be far better. As for one, say that you have a (Int8, Int32, Float64) record; in principle it could be easy to create a routine that arranges this record in the form (Float64,Int32, Int8) that optimizes the different field access (it may be even possible to introduce automatic padding later on if recarrays would support them in the future).
Maybe you are getting confused
Yes and no. :)
in thinking that recarrDescr will create the recarray. Not at all, this a *metadata* definition that can be passed to the actual recarray funtion for recarray creation.
Just like the type repetition tuple except also including field names and default values. I don't think you lost me. For what we do, the exact physical layout of the "struct" is important, so order matters. I see order as part of the meta-data, but I don't usually deal with meta-entities so maybe I've got that part wrong. :)
Its function would be similar to the formats parameter (with typical values like "3a,4i,3w") in recarray.array, but with more verbosity and all the reported advantages.
instead of
((Int16, 3), (Int32, 4), (Float64, 20), )
This is pretty much exactly what I was thinking. It is straightforward to imagine and difficult to forget.
the former being more handy in lots of situations.
Would you please name some of these so we can explore handling them both ways?
Well, I'm afraid that the best advantage would be when dealing with recarrays in C extension modules. In this kind of situation it would be far better to deal with a "3a4i3w" array than a tuple of python objects. But maybe I'm wrong and the latter is not so-complicated to manage; however, I used to work a lot with records (even before meeting recarray) and I was quite comfortable with formats in string mode.
I was thinking that if the above was an issue, we could write an API function(s) to "compile" the type-repetition tuple into arrays of ints which describe the type of each field and corresponding repetition factor.
Or perhaps it would be enough to provide a method for converting from the standard metadata layout (dictionary or tuple or whatever), to a string format. This should be not very difficult.
Almost exactly what I suggested above. See you Monday, Todd
A Dissabte 25 Gener 2003 20:30, Todd Miller va escriure:
Still think I'd prefer something seperable:
recarrStruct = ( (CharType, 16), UInt8, Int16, Int32, Int32, Float32, (Float64, 32), Int64 )
recarrFields = ["name", "TDCcount", "ADCcount", "grid_i", "grid_j", "pressure", "temperature", "idnumber"]
I guess it might not be quite as good for large structs.
Me too...
It's a good sign that both you and I thought of an identical tuple format; it's the obvious minimal one.
Yeah. We just differ in the way to arrange this metadata to be passed to the recarray constructor. But I think this is secondary compared to the flexibility that a verbose approach offers compared with the actual string format. In fact, more than one container might be supported to define the metadata; one can start with tuples as you suggest, but in the future other ways can be added (if considered convenient). For example, I think I'll stick with the dictionary option for PyTables, but also a class declaration for the metadata would be supported, like in : class Small(IsRecord): var1 = defineType(CharType, 2, "") var2 = defineType(Int32, 1) var3 = Float64 This would not be difficult to support because, by accessing to the Small().__dict__, you get also a dictionary. In addition, the latter will ensure (by construction) that you are not using a non-valid python identifier, which is mandatory in my current implementation. I find these containers (dictionaries and classes) both elegant and convenient.
Just like the type repetition tuple except also including field names and default values. I don't think you lost me. For what we do, the exact physical layout of the "struct" is important, so order matters. I see order as part of the meta-data, but I don't usually deal with meta-entities so maybe I've got that part wrong. :)
Well, if you need positional fields, you may add a (optional) parameter, called for example, "position" so that you can fix it.
I was thinking that if the above was an issue, we could write an API function(s) to "compile" the type-repetition tuple into arrays of ints which describe the type of each field and corresponding repetition factor.
Yeah, I agree that this would be the best solution. That way, the charcodes will be factored out from the code, and by just providing such and API (both in Python and C), would be enough to reconstruct them, if needed. That will allow a more consistent numarray internal code.
See you Monday,
Right, how did you know that? :) -- Francesc Alted
Francesc Alted wrote:
Yeah. We just differ in the way to arrange this metadata to be passed to the recarray constructor. But I think this is secondary compared to the flexibility that a verbose approach offers compared with the actual string format.
Yes. So one question is: if we were to add type-repetition tuples to recarray as an alternative to the current character code strings, would that be any form of improvement to recarray from your perspective? As I see it, recarray currently has a clean seperation between format and naming which permits the latter to be optional. Before changing that, I'd need a clear argument why. (I didn't design and generally don't even maintain recarray).
In fact, more than one container might be supported to define the metadata; one can start with tuples as you suggest, but in the future other ways can be added (if considered convenient).
For example, I think I'll stick with the dictionary option for PyTables, but also a class declaration for the metadata would be supported, like in :
class Small(IsRecord): var1 = defineType(CharType, 2, "") var2 = defineType(Int32, 1) var3 = Float64
This would not be difficult to support because, by accessing to the Small().__dict__, you get also a dictionary. In addition, the latter will ensure (by construction) that you are not using a non-valid python identifier, which is mandatory in my current implementation. I find these containers (dictionaries and classes) both elegant and convenient.
I'm not trying to be Mr. Negative here, but one thing to keep in mind is this:
class C: ... pass ... c = C() dir(c.__dict__) ['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__repr__', '__setattr__', '__setitem__', '__str__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
Which is to say, the instance dictionary is a little cluttered, and it might not be that easy to determine which objects in it are there to define the data format.
Just like the type repetition tuple except also including field names and default values. I don't think you lost me. For what we do, the exact physical layout of the "struct" is important, so order matters. I see order as part of the meta-data, but I don't usually deal with meta-entities so maybe I've got that part wrong. :)
Well, if you need positional fields, you may add a (optional) parameter, called for example, "position" so that you can fix it.
I'm sure that's not the easiest way to capture struct layout, but I take your point. Since position matters to me, I'd prefer that capturing them was implicit. Since it doesn't to you, it seems OK for it to be explicit. Either default mode can support the other, but capturing order with tuples is free, while capturing order with a __dict__ will take some kind of extra work.
I was thinking that if the above was an issue, we could write an API function(s) to "compile" the type-repetition tuple into arrays of ints which describe the type of each field and corresponding repetition factor.
Yeah, I agree that this would be the best solution. That way, the charcodes will be factored out from the code, and by just providing such and API (both in Python and C), would be enough to reconstruct them, if needed. That will allow a more consistent numarray internal code.
I'm thinking the general format for this may be converting N-tuples of types and ints into N arrays of types and ints. And vice versa. It's obvious how this works with numarray types. I think the chararray types need work and need to be mapped into the same integer enumeration as the numeric types in a non-overlapping way.
See you Monday,
Right, how did you know that? :)
Insightful on weekends anyway, Todd
Francesc Alted wrote:
A Dilluns 27 Gener 2003 15:42, Todd Miller va escriure:
Yes. So one question is: if we were to add type-repetition tuples to recarray as an alternative to the current character code strings, would that be any form of improvement to recarray from your perspective?
Well, at least, charcodes can be avoided. I think it's a big win... or maybe not as big?
I think that avoiding the charcodes would be an improvement. Type-repetition tuples provide a clear well defined way to define data formats. It's not so clear that it eliminates the requirement for on-going Numeric compatability, but it might.
As I see it, recarray currently has a clean seperation between format and naming which permits the latter to be optional. Before changing that, I'd need a clear argument why. (I didn't design and generally don't even maintain recarray).
One argument is the fact that a map is very clear to the user, although that such a map can be built *after* the names and format are passed to the recarray constructor and be accessible as an atribute. However, the latter solution is worse IMO, because the user has to supply two separate pieces of information when, actually, these should be regarded as a unity. Anyway, this maybe a subjective perception.
Well, I think there's truth to the danger of seperating names from data declarations, but it is easy to map keys(), values() to the seperate pieces in a different layer if necessary.
This would not be difficult to support because, by accessing to the Small().__dict__, you get also a dictionary. In addition, the latter will ensure (by construction) that you are not using a non-valid python identifier, which is mandatory in my current implementation. I find these containers (dictionaries and classes) both elegant and convenient.
I'm not trying to be Mr. Negative here, but one thing to keep in mind
Oh dear, you are right!.
For a few seconds there, I thought I was on a roll!
In fact, I forgot that to make this to work, you need to use the metaclasses introduced in Python 2.2 (see Alex Martelli's post: http://mail.python.org/pipermail/python-list/2002-July/112007.html). I was following this recipe, but I forgot that I was using Python 2.2.
So, as numarray has to work with previous python versions, there is no point to care about that.
In truth, numarray-0.4 and up already require Python-2.2 and up.
I'm sure that's not the easiest way to capture struct layout, but I take your point. Since position matters to me, I'd prefer that capturing them was implicit. Since it doesn't to you, it seems OK for it to be explicit. Either default mode can support the other, but capturing order with tuples is free, while capturing order with a __dict__ will take some kind of extra work.
That's right. We have some different needs and priorities, and we should take the approach better suited to each other. But exchanging points of view is always a great thing.
I'm thinking the general format for this may be converting N-tuples of types and ints into N arrays of types and ints. And vice versa. It's obvious how this works with numarray types. I think the chararray types need work and need to be mapped into the same integer enumeration as the numeric types in a non-overlapping way.
I can't catch your point here. Why there should be a problem with chararrays?.
What I was trying to see is that chararray types are not as well designed as the numarray types, nor are they reflected in the C-API.
A Dilluns 27 Gener 2003 15:42, Todd Miller va escriure:
Yes. So one question is: if we were to add type-repetition tuples to recarray as an alternative to the current character code strings, would that be any form of improvement to recarray from your perspective?
Well, at least, charcodes can be avoided. I think it's a big win... or maybe not as big?
As I see it, recarray currently has a clean seperation between format and naming which permits the latter to be optional. Before changing that, I'd need a clear argument why. (I didn't design and generally don't even maintain recarray).
One argument is the fact that a map is very clear to the user, although that such a map can be built *after* the names and format are passed to the recarray constructor and be accessible as an atribute. However, the latter solution is worse IMO, because the user has to supply two separate pieces of information when, actually, these should be regarded as a unity. Anyway, this maybe a subjective perception.
This would not be difficult to support because, by accessing to the Small().__dict__, you get also a dictionary. In addition, the latter will ensure (by construction) that you are not using a non-valid python identifier, which is mandatory in my current implementation. I find these containers (dictionaries and classes) both elegant and convenient.
I'm not trying to be Mr. Negative here, but one thing to keep in mind
Oh dear, you are right!. In fact, I forgot that to make this to work, you need to use the metaclasses introduced in Python 2.2 (see Alex Martelli's post: http://mail.python.org/pipermail/python-list/2002-July/112007.html). I was following this recipe, but I forgot that I was using Python 2.2. So, as numarray has to work with previous python versions, there is no point to care about that.
I'm sure that's not the easiest way to capture struct layout, but I take your point. Since position matters to me, I'd prefer that capturing them was implicit. Since it doesn't to you, it seems OK for it to be explicit. Either default mode can support the other, but capturing order with tuples is free, while capturing order with a __dict__ will take some kind of extra work.
That's right. We have some different needs and priorities, and we should take the approach better suited to each other. But exchanging points of view is always a great thing.
I'm thinking the general format for this may be converting N-tuples of types and ints into N arrays of types and ints. And vice versa. It's obvious how this works with numarray types. I think the chararray types need work and need to be mapped into the same integer enumeration as the numeric types in a non-overlapping way.
I can't catch your point here. Why there should be a problem with chararrays?. -- Francesc Alted
Francesc Alted wrote:
So, as numarray has to work with previous python versions,
Why? Anyone using NumArray is either starting from scratch or porting from Numeric, so having to port to a newer version of Python is a very small deal. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Chris Barker wrote:
Francesc Alted wrote:
So, as numarray has to work with previous python versions,
Why? Anyone using NumArray is either starting from scratch or porting from Numeric, so having to port to a newer version of Python is a very small deal.
Just to make it very clear: numarray-0.4 and up require Python-2.2 or higher. Up until numarray-0.4 (released in November), that was not the case, and numarray ran (and was tested!) on Python-2.0 and higher. The desire to increase C-level Numeric compatability and to improve simple indexing speed led us to a C baseclass, which is only supported in Python-2.2 and up. Todd
A Dilluns 27 Gener 2003 17:28, Todd Miller va escriure:
So, as numarray has to work with previous python versions, there is no point to care about that.
In truth, numarray-0.4 and up already require Python-2.2 and up.
Oh!, I didn't know that. In such a case, I think it's worth to consider the possibility to define records as classes descendants from metaclasses. But, of course, you have the ultimate decision.
I'm thinking the general format for this may be converting N-tuples of types and ints into N arrays of types and ints. And vice versa. It's obvious how this works with numarray types. I think the chararray types need work and need to be mapped into the same integer enumeration as the numeric types in a non-overlapping way.
I can't catch your point here. Why there should be a problem with chararrays?.
What I was trying to see is that chararray types are not as well designed as the numarray types, nor are they reflected in the C-API.
I see. Well, is it really desirable such a unification? CharArray entities come from a module and NumArray from another one, and that should be ok. Why bother in creating a unified API or integer enumeration?. I think this should be not a big drawback for C-extension crafters (although, to say the truth, that would be very elegant if you manage to do that, but maybe it is not worth the effort, I don't know). -- Francesc Alted
Francesc Alted wrote:
A Dilluns 27 Gener 2003 17:28, Todd Miller va escriure:
So, as numarray has to work with previous python versions, there is no point to care about that.
In truth, numarray-0.4 and up already require Python-2.2 and up.
Oh!, I didn't know that. In such a case, I think it's worth to consider the possibility to define records as classes descendants from metaclasses. But, of course, you have the ultimate decision.
I don't know what you mean here. Please spell it out a little more.
I'm thinking the general format for this may be converting N-tuples of types and ints into N arrays of types and ints. And vice versa. It's obvious how this works with numarray types. I think the chararray types need work and need to be mapped into the same integer enumeration as the numeric types in a non-overlapping way.
I can't catch your point here. Why there should be a problem with chararrays?.
What I was trying to see is that chararray types are not as well designed as the numarray types, nor are they reflected in the C-API.
I see. Well, is it really desirable such a unification? CharArray entities come from a module and NumArray from another one, and that should be ok. Why bother in creating a unified API or integer enumeration?.
It may not be necessary. Int8 with repitition factors may work about the same.
A Dilluns 27 Gener 2003 20:52, Todd Miller va escriure:
Oh!, I didn't know that. In such a case, I think it's worth to consider the possibility to define records as classes descendants from metaclasses. But, of course, you have the ultimate decision.
I don't know what you mean here. Please spell it out a little more.
I was trying to mean that using something like : class Small(IsRecord): field1 = defineType(CharType, 2, default="", position=1) field2 = defineType(Int32, 1, position=2) field3 = Float64 as as container for recarray metadata is definitely possible instead of the tuple (formats="2aid",names=("field1","field2", "field3")), if using Python2.2. IsRecord is a metaclass (introduced in Python 2.2) that allows you to effectively separate the declared attributes from the implicit ones in normal classes. Of course, you can taylor IsRecord so as to fulfill your needs. I hope that I have expressed myself more clearly now, -- Francesc Alted
Francesc Alted wrote:
A Dilluns 27 Gener 2003 20:52, Todd Miller va escriure:
Oh!, I didn't know that. In such a case, I think it's worth to consider the possibility to define records as classes descendants from metaclasses. But, of course, you have the ultimate decision.
I don't know what you mean here. Please spell it out a little more.
I was trying to mean that using something like :
class Small(IsRecord): field1 = defineType(CharType, 2, default="", position=1) field2 = defineType(Int32, 1, position=2) field3 = Float64
as as container for recarray metadata is definitely possible instead of the tuple (formats="2aid",names=("field1","field2", "field3")), if using Python2.2. IsRecord is a metaclass (introduced in Python 2.2) that allows you to effectively separate the declared attributes from the implicit ones in normal classes.
Of course, you can taylor IsRecord so as to fulfill your needs.
I hope that I have expressed myself more clearly now,
I looked at your docs here: http://pytables.sourceforge.net/html-doc/usersguide-html4.html#section4.2 and what you said above clicked. Thanks. Todd
participants (5)
-
Chris Barker
-
Francesc Alted
-
Paul F Dubois
-
Perry Greenfield
-
Todd Miller