A proposal for dtype/dtypedescr in numpy objects

Hi, In my struggle for getting consistent behaviours with data types, I've ended with a new proposal for treating them. The basic thing is that I suggest to deprecate .dtype as being a first-class attribute and replace it instead by the descriptor type container, which I find quite more useful for end users. The current .dtype type will be still accessible (mainly for developers) but buried in .dtype.type. Briefly stated: current proposed ======= ======== .dtypedescr --> moved into .dtype .dtype --> moved into .dtype.type .dtype.dtypestr --> moved into .dtype.str new .dtype.name What is achieved with that? Well, not much, except easy of use and type comparison correctness. For example, with the next setup:
import numpy a=numpy.arange(10,dtype='i') b=numpy.arange(10,dtype='l')
a.dtype <type 'int32_arrtype'> a.dtypedescr
we have currently: dtypedescr('<i4')
a.dtypedescr.dtypestr '<i4' a.dtype.__name__[:-8] 'int32' a.dtype == b.dtype False
a.dtype.type <type 'int32_arrtype'> a.dtype
With the new proposal, we would have: dtype('<i4')
a.dtype.str '<i4' a.dtype.name 'int32' a.dtype == b.dtype True
The advantages of the new proposal are: - No more .dtype and .dtypedescr lying together, just current .dtypedescr renamed to .dtype. I think that current .dtype does not provide more useful information than current .dtypedesc, and giving it a shorter name than .dtypedescr seems to indicate that it is more useful to users (and in my opinion, it isn't). - Current .dtype is still accessible, but specifying and extra name in path: .dtype.type (can be changed into .dtype.type_ or whatever). This should be useful mainly for developers. - Added a useful dtype(descr).name so that one can quickly access to the type name. - Comparison between data types works as it should now (without having to create a metaclass for PyType_Type). Drawbacks: - Backward incompatible change. However, provided the advantages are desirable, I think it is better changing now than later. - I don't specially like the string representation for the new .dtype class. For example, I'd find dtype('Int32') much better than dtype('<i4'). However, this would represent more changes in the code, but they can be made later on (much less disruptive than the proposed change). - Some other issues that I'm not aware of. I'm attaching the patch for latest SVN. Once applied (please, pay attention to the "XXX" signs in patched code), it passes all tests. However, it may remain some gotchas (specially those cases that are not checked in current tests). In case you are considering this change to check in, please, tell me and I will revise much more carefully the patch. If don't, never mind, it has been a good learning experience anyway. Uh, sorry for proposing this sort of things in the hours previous to a public release of numpy. --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Oops, re-reading my last message, I discovered a small errata: A Dijous 05 Gener 2006 22:40, Francesc Altet va escriure:
- I don't specially like the string representation for the new .dtype class. For example, I'd find dtype('Int32') much better than ^^^^^ should read 'int32' (to follow numpy conventions)
dtype('<i4'). However, this would represent more changes in the code, but they can be made later on (much less disruptive than the proposed change).
--
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Hi, Following up: There was never any response to Francesc proposal ! I thought it sounded pretty good - as he argued: Still a good (late but acceptable) time to clean things up ! (I like just the fact that it removes the "ugly" doubling of having two: arr.dtype and arr.dtypecode ) Is this still on the table !? - Sebastian Haase On Thursday 05 January 2006 13:40, Francesc Altet wrote:
Hi,
In my struggle for getting consistent behaviours with data types, I've ended with a new proposal for treating them. The basic thing is that I suggest to deprecate .dtype as being a first-class attribute and replace it instead by the descriptor type container, which I find quite more useful for end users. The current .dtype type will be still accessible (mainly for developers) but buried in .dtype.type.
Briefly stated:
current proposed ======= ======== .dtypedescr --> moved into .dtype .dtype --> moved into .dtype.type .dtype.dtypestr --> moved into .dtype.str new .dtype.name
What is achieved with that? Well, not much, except easy of use and
type comparison correctness. For example, with the next setup:
import numpy a=numpy.arange(10,dtype='i') b=numpy.arange(10,dtype='l')
we have currently:
a.dtype
<type 'int32_arrtype'>
a.dtypedescr
dtypedescr('<i4')
a.dtypedescr.dtypestr
'<i4'
a.dtype.__name__[:-8]
'int32'
a.dtype == b.dtype
False
With the new proposal, we would have:
a.dtype.type
<type 'int32_arrtype'>
a.dtype
dtype('<i4')
a.dtype.str
'<i4'
a.dtype.name
'int32'
a.dtype == b.dtype
True
The advantages of the new proposal are:
- No more .dtype and .dtypedescr lying together, just current .dtypedescr renamed to .dtype. I think that current .dtype does not provide more useful information than current .dtypedesc, and giving it a shorter name than .dtypedescr seems to indicate that it is more useful to users (and in my opinion, it isn't).
- Current .dtype is still accessible, but specifying and extra name in path: .dtype.type (can be changed into .dtype.type_ or whatever). This should be useful mainly for developers.
- Added a useful dtype(descr).name so that one can quickly access to the type name.
- Comparison between data types works as it should now (without having to create a metaclass for PyType_Type).
Drawbacks:
- Backward incompatible change. However, provided the advantages are desirable, I think it is better changing now than later.
- I don't specially like the string representation for the new .dtype class. For example, I'd find dtype('Int32') much better than dtype('<i4'). However, this would represent more changes in the code, but they can be made later on (much less disruptive than the proposed change).
- Some other issues that I'm not aware of.
I'm attaching the patch for latest SVN. Once applied (please, pay attention to the "XXX" signs in patched code), it passes all tests. However, it may remain some gotchas (specially those cases that are not checked in current tests). In case you are considering this change to check in, please, tell me and I will revise much more carefully the patch. If don't, never mind, it has been a good learning experience anyway.
Uh, sorry for proposing this sort of things in the hours previous to a public release of numpy.

Sebastian Haase wrote:
Hi, Following up: There was never any response to Francesc proposal ! I thought it sounded pretty good - as he argued: Still a good (late but acceptable) time to clean things up ! (I like just the fact that it removes the "ugly" doubling of having two: arr.dtype and arr.dtypecode )
I think this proposal came during busy times and was not able to be looked at seriously. Times are still busy and so it is difficult to know what to do wih it. I think there is validity to what he is saying. The dtypedescr was only added in December while the dtype was there in March, so the reason for it is historical. I would not mind changing it so that .dtype actually returned the type-descriptor object. This would actually make things easier. It's only historical that it's not that way. One issue is that .dtypechar is a simple replacement for .typecode() but .dtype.char would involve two attribute lookups which may not be a good thing. But, this might not be a big deal because they should probably be using .dtype anyway.
Is this still on the table !?
I'm willing to look at it, especially since I like the concept of the dtypedescr much better.
In my struggle for getting consistent behaviours with data types, I've ended with a new proposal for treating them. The basic thing is that I suggest to deprecate .dtype as being a first-class attribute and replace it instead by the descriptor type container, which I find quite more useful for end users.
I think this is true... I was just nervous to change it. But, prior to a 1.0 release I think we still could, if we do it quickly...
The current .dtype type will be still accessible (mainly for developers) but buried in .dtype.type.
Briefly stated:
current proposed ======= ======== .dtypedescr --> moved into .dtype .dtype --> moved into .dtype.type .dtype.dtypestr --> moved into .dtype.str new .dtype.name
I actually like this proposal a lot as I think it gives proper place to the data-type descriptors.
I say we do it, very soon, and put out another release quickly. -Travis
participants (3)
-
Francesc Altet
-
Sebastian Haase
-
Travis Oliphant