Unicode style in win32/PythonWin

Fri Jan 13 11:14:59 EST 2006

"Robert" <kxroberto at googlemail.com> writes:

> Thomas Heller schrieb:

>> So after these assignments:
>>
>>   ctypes.windll.user32.MessageBoxW.argtypes = (c_int, c_wchar_p,
>>                                                c_wchar_p, c_int)
>>   ctypes.windll.user32.MessageBoxA.argtypes = (c_int, c_char_p,
>>                                                c_char_p, c_int)
>>
>> both MessageBoxA and MessageBoxW can both be called with either ansi and
>> unicode strings, and should work correctly.  By default the conversion
>> is done with ('msbc', 'ignore'), but this can also be changed,
>> ctypes-wide, with a call to ctypes.set_conversion_mode(encoding,errors).
>
> That is a right style of functionality, consistency and duty-free
> default execution flow which python and pythonwin are lacking so far.
> Those have no prominent mode-setting function, the mode-_tuple_ etc. so
> far and/or defaults are set to break simple apps with common tasks.
>
> Only question: is there a reason to have 'ignore' instead of 'replace'
> as default? Wouldn't 'replace' deliver better indications (as for
> example every Webbrowser does on unknown unicode chars ; (and even
> mbcs_encode in 'strict'-mode) ). I can not see any advantage of
> 'ignore' vs. 'replace' when strict equality anyway has been given up

Hm, I don't know.  I try to avoid converting questionable characters at
all, if possible.  Then, it seems the error-mode doesn't seem to change
anything with "mbcs" encoding.  WinXP, Python 2.4.2 on the console:

>>> u"abc\u034adef".encode("mbcs", "ignore")
'abc?def'
>>> u"abc\u034adef".encode("mbcs", "strict")
'abc?def'
>>> u"abc\u034adef".encode("mbcs", "error")
'abc?def'
>>>

With "latin-1", it is different:

>>> u"abc\u034adef".encode("latin-1", "ignore")
'abcdef'
>>> u"abc\u034adef".encode("latin-1", "replace")
'abc?def'
>>> u"abc\u034adef".encode("latin-1", "strict")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u034a' in position 3: ordinal not in range(256)
>>>

Thomas