M.-A. Lemburg wrote:
Travis E. Oliphant wrote:
I understand and that's why I'm asking why you made the range explicit in the definition.
In the case of NumPy it was so that String and Unicode arrays would both look like multi-length string "character" arrays and not arrays of arrays of some character.
But, this can change in the data-format object. I can see that the Unicode description needs to be improved.
The definition should talk about Unicode code points. The number of bytes then determines whether you can only represent the ASCII subset (1 byte), UCS2 (2 bytes, BMP only) or UCS4 (4 bytes, all currently assigned code points).
Yes, you are correct. A string of unicode characters should really be represented in the same way that an array of integers is represented for a data-format object.