[Python-3000] PEP Draft: Enhancing the buffer protcol

Wed Feb 28 18:56:16 CET 2007

Thomas Heller wrote:
> 
>>Additions to the struct string-syntax
>>
>>   The struct string-syntax is missing some characters to fully
>>   implement data-format descriptions already available elsewhere (in
>>   ctypes and NumPy for example).  Here are the proposed additions:
>>
>>   Character         Description
>>   ==================================
>>   '1'               bit (number before states how many bits)
>>   '?'               platform _Bool type 
> 
> 
> In SVN trunk (2.6), the struct module already supports _Bool, but the
> format character used is 't'.  Not a big issue, though, and I like '?'
> better.
> 

I think 't' should be used for the bit type also (because '1' is 
confusing when you have something like '71b'  which looks like 71 signed 
chars but is actually 7 bits + 1 signed char).

I've changed this in the current PEP.

> 
>>   'g'               long double  
>>   'F'               complex float  
>>   'D'               complex double 
>>   'G'               complex long double 
> 
> 
> IIUC, in the latest PEP draft you have apparently changed to two-letter codes
> for complex types; which is inconsistent with previous conventions in struct.

Yeah, I've introduced two-letter codes for pointers as well. But, there 
is a certain logic to it because 'Zd' would be similar to 'dd' except 
you would know that the two are supposed to be treated as a complex number.

> 
> 
>>   'c'               ucs-1 (latin-1) encoding 
>>   'u'               ucs-2 
>>   'w'               ucs-4 
>>   'O'               pointer to Python Object 
>>   'T{}'             structure (detailed layout inside {}) 
>>   '(k1,k2,...,kn)'  multi-dimensional array of whatever follows 
>>   ':name:'          optional name of the preceeding element 
>>   '&'               specific pointer (prefix before another charater) 
>>   'X{}'             pointer to a function (optional function 
>>                                             signature inside {})
>>
>>   The struct module will be changed to understand these as well and
>>   return appropriate Python objects on unpacking.  Un-packing a
>>   long-double will return a c-types long_double.
> 
> 
> This is probably because there is no way for current Python to support
> the long double datatype. 

Right.   On some platforms there is no difference between double and 
long double.  I guess returning a decimal object might actually be the 
easiest solution.

> The question for ctypes is: How should ctypes
> support that?  Should the .value attribute of a c_longdouble have two
> components, should it expose the value as decimal, should Python itself
> switch to using long double internally, or are there other possibilities?
> 

I think I like the decimal object solution better.

> 
>> Unpacking 'u' or
>>   'w' will return Python unicode.  Unpacking a multi-dimensional
>>   array will return a list of lists.  Un-packing a pointer will
>>   return a ctypes pointer object.
> 
> 
> ctypes does not support pointer objects of non-native byte order;
> should they be forbidden?

Yes, I'm fine with them being forbidden.

> 
> 
>>
>>   Functions should be added to ctypes to create a ctypes object from
>>   a struct description, and add long-double, and ucs-2 to ctypes.
> 
> 
> Well, ucs-4 should probably be added to ctypes as well.  The current ctypes.c_wchar
> type corresponds to the C WCHAR type, its size is configuration dependend.

I think you are right.  In the discussions for unifying string/unicode I 
really like the proposals that are leaning toward having a unicode 
object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending 
on what is in the string.

This does create some conversion issues that must be handled, but I 
think it is the best option.   In the Python 3.0 version of NumPy, I 
think that's what we are going to have (three different string types 
ucs-1, ucs-2, ucs-4).

-Travis