[Python-Dev] PEP: Adding data-type objects to Python

Travis E. Oliphant oliphant.travis at ieee.org
Sun Oct 29 02:18:04 CEST 2006

Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
>> In this case, the 'kind' does not specify how large the data-type is. 
>> You can have 'u1', 'u2', 'u4', etc.
>> The same is true with Unicode.  You can have 10-character unicode 
>> elements, 20-character, etc.  But, we have to be clear about what a 
>> "character" is in the data-format.
> That is certainly confusing. In u1, u2, u4, the digit seems to indicate
> the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet,
> in U20, it does *not* indicate the size of a single value but of an
> array? And then, it's not the size, but the number of elements?

Good point.  In NumPy, unicode support was added "in parallel" with 
string arrays where there is not the ambiguity.   So, yes, it's true 
that the unicode case is a special-case.

The other way to handle it would be to describe the 'code'-point size 
(i.e. 'U1', 'U2', 'U4' for UCS-1, UCS-2, UCS-4) and then have the length 
be encoded as an "array" of those types.

This was not the direction we took with NumPy (which is what I'm using 
as a reference) because I wanted Unicode and string arrays to look the 
same and thought of strings differently.

How to handle unicode data-formats could definitely be improved. 
Suggestions are welcome.


More information about the Python-Dev mailing list