[python-win32] python3 and extended mapi

Paul_Koning at Dell.com Paul_Koning at Dell.com
Tue Jun 10 17:22:03 CEST 2014


On Jun 9, 2014, at 9:40 PM, Christian K. <ckkart at hoc.net> wrote:

> Am 09.06.14 16:00, schrieb Paul_Koning at Dell.com:
>> 
>> On Jun 9, 2014, at 2:53 PM, Christian K. <ckkart at hoc.net> wrote:
>> 
>>> <Paul_Koning <at> Dell.com> writes:
>>> 
>>>> 
>>>> 
>>>> On Jun 9, 2014, at 9:07 AM, Christian K. <ckkart <at> hoc.net> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I was very pleased to see that retrieving properties of a MAPI object yields
>>>>> either a <str> or <bytes> type depending on whether the _A or _W property
>>>>> was queried …
>>>> 
>>>> Really?  That seems strange.  As I recall, the *_W APIs are “wide
>>> character” ones.  So in Python 3, they
>>>> should both map to <str> type.  <bytes> applies only to non-text data.
>>> 
>>> At least for text properties like e.g. PR_SUBJECT_A / _W the former returns
>>> a mbcs encoded "string", i.e. of bytes type and the latter a 2-byte unicode
>>> string. Binary properties are always returned as bytes in contrast to
>>> earlier when using pyrhon2.
>> 
>> Yes, “bytes” for binary values is clearly correct.  But MBCS and “2 byte Unicode” (more accurately called either UCS-2 or UCS-2 BMP subset, not sure which) are both text strings.  The different encoding in the API doesn’t mean they should be different datatypes in Python 3; both cases are properly mapped to “str”.
> 
> No, this is not what I am seeing. MBCS encoded properties, i.e. those terminating with _A are mapped to 'bytes' and the _W ones to 'str' which is consistent with the handling of unicode and encoded information in python3. And this is great indeed because having to distinguish between strings which can be encoded or not while having the same type is really painful.

Perhaps I’m missing something.

I’m used to Windows API calls that come in a foo_A and foo_W flavor, the only difference being that the _A flavor has ASCII arguments and the _W flavor has Unicode arguments (for those arguments that are, abstractly, strings).

In Python 3, the “str” type is an abstract string; its character repertoire is Unicode but it doesn’t have an encoding.  Instead, encoding and decoding is done when it is converted to/from external interfaces — files, external API calls, etc.

So... I would expect foo_A and foo_W to have “str” arguments, and the interface machinery between Python3 and those functions would run the appropriate encoding to generate the string representation expected.

For example, if a given API wants strings in ASCII form, it would be str.encode (“ascii”) or perhaps str.encode (“latin1”).  If it wants MBCS data, it would be encode to that encoding.  If 2-byte Unicode, it would be encode to ucs-2.  And so on.  Ditto in the reverse direction, when strings are delivered by an external function.

I would only want/expect to see “bytes” types when the values in question are binary data streams, or unknown format.  But anytime we’re dealing with text strings, the Python 3 approach is that the Python code sees “str” type, and questions of encoding have been handled at the edge.  This is where Python 3 gets it right and Python 2 was a big muddle.

Mark, could you clarify how you would expect this to work?

	paul



More information about the python-win32 mailing list