[Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant
oliphant.travis at ieee.org
Sun Oct 29 02:18:04 CEST 2006
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
>> In this case, the 'kind' does not specify how large the data-type is.
>> You can have 'u1', 'u2', 'u4', etc.
>> The same is true with Unicode. You can have 10-character unicode
>> elements, 20-character, etc. But, we have to be clear about what a
>> "character" is in the data-format.
> That is certainly confusing. In u1, u2, u4, the digit seems to indicate
> the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet,
> in U20, it does *not* indicate the size of a single value but of an
> array? And then, it's not the size, but the number of elements?
Good point. In NumPy, unicode support was added "in parallel" with
string arrays where there is not the ambiguity. So, yes, it's true
that the unicode case is a special-case.
The other way to handle it would be to describe the 'code'-point size
(i.e. 'U1', 'U2', 'U4' for UCS-1, UCS-2, UCS-4) and then have the length
be encoded as an "array" of those types.
This was not the direction we took with NumPy (which is what I'm using
as a reference) because I wanted Unicode and string arrays to look the
same and thought of strings differently.
How to handle unicode data-formats could definitely be improved.
Suggestions are welcome.
More information about the Python-Dev