[Python-Dev] PEP: Adding data-type objects to Python

Sat Oct 28 21:31:34 CEST 2006

Travis E. Oliphant wrote:
> M.-A. Lemburg wrote:
>> Travis E. Oliphant wrote:
>>> M.-A. Lemburg wrote:
>>>> Travis E. Oliphant wrote:
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> PEP: <unassigned>
>>>>> Title: Adding data-type objects to the standard library
>>>>>   Attributes
>>>>>
>>>>>      kind      --  returns the basic "kind" of the data-type. The basic kinds
>>>>>                      are:
>>>>>                        't' - bit, 
>>>>>                        'b' - bool, 
>>>>>                        'i' - signed integer, 
>>>>>                        'u' - unsigned integer,
>>>>>                        'f' - floating point,                  
>>>>>                        'c' - complex floating point, 
>>>>>                        'S' - string (fixed-length sequence of char),
>>>>>                        'U' - fixed length sequence of UCS4,
>>>> Shouldn't this read "fixed length sequence of Unicode" ?!
>>>> The underlying code unit format (UCS2 and UCS4) depends on the
>>>> Python version.
>>> Well, in NumPy 'U' always means UCS4.  So, I just copied that over.  See 
>>> my questions at the bottom which talk about how to handle this.  A 
>>> data-format does not necessarily have to correspond to something Python 
>>> represents with an Object.
>> Ok, but why are you being specific about UCS4 (which is an internal
>> storage format), while you are not specific about e.g. the
>> internal bit size of the integers (which could be 32 or 64 bit) ?
>>
> 
> The 'kind' does not specify how "big" the data-type (data-format) is.  A 
> number is needed to represent the number of bytes.
> 
> In this case, the 'kind' does not specify how large the data-type is. 
> You can have 'u1', 'u2', 'u4', etc.
> 
> The same is true with Unicode.  You can have 10-character unicode 
> elements, 20-character, etc.  But, we have to be clear about what a 
> "character" is in the data-format.

I understand and that's why I'm asking why you made the range
explicit in the definition.

The definition should talk about Unicode code points.
The number of bytes then determines whether you can only
represent the ASCII subset (1 byte), UCS2 (2 bytes, BMP only)
or UCS4 (4 bytes, all currently assigned code points).

This is similar to the range for integers (ie. ZZ_0), where
the number of bytes determines the range of numbers that can
be represented.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 28 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::