[python-win32] python3 and extended mapi

Christian K. ckkart at hoc.net
Tue Jun 10 19:08:41 CEST 2014


 <Paul_Koning <at> Dell.com> writes:

> 
> 
> On Jun 9, 2014, at 9:40 PM, Christian K. <ckkart <at> hoc.net> wrote:
> 
> > Am 09.06.14 16:00, schrieb Paul_Koning <at> Dell.com:
> >> 
> >> On Jun 9, 2014, at 2:53 PM, Christian K. <ckkart <at> hoc.net> wrote:
> >> 
> >>> <Paul_Koning <at> Dell.com> writes:
> >>> 
> >>>> 
> >>>> 
> >>>> On Jun 9, 2014, at 9:07 AM, Christian K. <ckkart <at> hoc.net> wrote:
> >>>> 
> >>>>> Hi,
> >>>>> 
> >>>>> I was very pleased to see that retrieving properties of a MAPI
object yields
> >>>>> either a <str> or <bytes> type depending on whether the _A or _W
property
> >>>>> was queried …
> >>>> 
> >>>> Really?  That seems strange.  As I recall, the *_W APIs are “wide
> >>> character” ones.  So in Python 3, they
> >>>> should both map to <str> type.  <bytes> applies only to non-text data.
> >>> 
> >>> At least for text properties like e.g. PR_SUBJECT_A / _W the former
returns
> >>> a mbcs encoded "string", i.e. of bytes type and the latter a 2-byte
unicode
> >>> string. Binary properties are always returned as bytes in contrast to
> >>> earlier when using pyrhon2.
> >> 
> >> Yes, “bytes” for binary values is clearly correct.  But MBCS and “2
byte Unicode” (more
> accurately called either UCS-2 or UCS-2 BMP subset, not sure which) are
both text strings.  The different
> encoding in the API doesn’t mean they should be different datatypes in
Python 3; both cases are properly
> mapped to “str”.
> > 
> > No, this is not what I am seeing. MBCS encoded properties, i.e. those
terminating with _A are mapped to
> 'bytes' and the _W ones to 'str' which is consistent with the handling of
unicode and encoded information
> in python3. And this is great indeed because having to distinguish between
strings which can be encoded or
> not while having the same type is really painful.
> 
> Perhaps I’m missing something.
> 
> I’m used to Windows API calls that come in a foo_A and foo_W flavor, the
only difference being that the _A
> flavor has ASCII arguments and the _W flavor has Unicode arguments (for
those arguments that are,
> abstractly, strings).
> 
> In Python 3, the “str” type is an abstract string; its character
repertoire is Unicode but it doesn’t
> have an encoding.  Instead, encoding and decoding is done when it is
converted to/from external
> interfaces — files, external API calls, etc.

True, and the type which handles that data is called "str". In contrast to
what I said before more than two bytes have to be used internally since
unicode defines more than 100000 characters.

> So... I would expect foo_A and foo_W to have “str” arguments, and the
interface machinery between
> Python3 and those functions would run the appropriate encoding to generate
the string representation expected.
> 
> For example, if a given API wants strings in ASCII form, it would be
str.encode (“ascii”) or perhaps
> str.encode (“latin1”).  If it wants MBCS data, it would be encode to that
encoding.  If 2-byte Unicode,
> it would be encode to ucs-2.  And so on.  Ditto in the reverse direction,
when strings are delivered by an
> external function.

Whenever you encode a "str" object in python3, i.e. call its encode() method
you will end up with a "bytes" object. And vice versa only a "bytes" object
does have a decode method. So the concept of having a unicode character pool
and its printable representation is reflected by two differnt types in
python. unicode:"str", any encoded string:"bytes"
For that reason, if a function returns an encoded string, the return type
has to be "bytes" and this is what is happening when retrieving the _A
properties.
Try it yourself: type('a') yields "unicode", type('a'.encode('asci''))
yields "bytes"

> I would only want/expect to see “bytes” types when the values in question
are binary data streams, or
> unknown format.  But anytime we’re dealing with text strings, the Python 3
approach is that the Python
> code sees “str” type, and questions of encoding have been handled at the
edge.  This is where Python 3

This is not true, see above.

Christian





More information about the python-win32 mailing list