[Tutor] sys.getfilesystemencoding()

Albert-Jan Roskam fomcl at yahoo.com
Thu Dec 20 19:01:54 CET 2012


> On Wed, Dec 19, 2012 at 5:43 AM, Albert-Jan Roskam <fomcl at yahoo.com> 

> wrote:
>> 
>> So MBCS is just a collective noun for whatever happens to be the
>> installed/available codepage of the host computer (at least with
>> CP_ACP)?
> 
> To be clear, the "mbcs" encoding in Python uses CP_ACP. MBCS means
> multibyte character set. The term ANSI gets thrown around, too, but
> Windows legacy code pages aren't ANSI standards.
> 
>> I didn't know anything about wintypes and I find it quite hard to
>> understand! I am trying to write a ctypes wrapper for
>> WideCharToMultiByte.
> 
> Just for the fun of it?

Yes, I am afraid so. ;-) 


>> http://pastebin.com/SEr4Wec9
>> The code is kinda verbose, but I hope this makes it easier to read.
>> Does this makes sense at all? As for now, the program returns an
>> error code (oddly, zero is an error code here).
> 
> Use None for NULL.

Aahh... I was already thinking the prototype didn't match the zeroes.

> You shouldn't encode a string argument you've declared as c_wchar_p
> (i.e. wintypes.LPCWSTR, i.e. type 'Z'). If you initialize to a byte
> string, the setter Z_set calls PyUnicode_FromEncodedObject using the
> "mbcs" encoding (this is the default on Windows, set by
> set_conversion_mode("mbcs", "ignore")). This hands off to 
> decode_mbcs,
> which produces nonsense for a UTF-16LE encoded string.

Ok, yes, that was plain stupid of me.
 
> GetLastError should be defined already, along with WinError, a
> convenience function that returns an instance of WindowsError. 2.6.4
> source:

Convenient indeed. No need to reinvent the wheel.

> http://hg.python.org/cpython/file/8803c3d61da2/Lib/ctypes/__init__.py#l448
> 
> Here's a quick hack that you should help you along:
> 
>     import ctypes
>     from ctypes import wintypes

As per PEP8, the only time I use from x import * is with ctypes. Don't you do this because of name clashes with wintypes? I general, the module-dot-function notation is nicer (I hate that about R, where this is almost the rule, although one could write things like reshape::melt)

>     _CP_UTF8 = 65001
>     _CP_ACP = 0  # ANSI
>     _LPBOOL = ctypes.POINTER(ctypes.c_long)
> 
>     _wideCharToMultiByte = ctypes.windll.kernel32.WideCharToMultiByte
>     _wideCharToMultiByte.restype = ctypes.c_int
>     _wideCharToMultiByte.argtypes = [
>       wintypes.UINT, wintypes.DWORD, wintypes.LPCWSTR, ctypes.c_int,
>       wintypes.LPSTR, ctypes.c_int, wintypes.LPCSTR, _LPBOOL]
> 
>     def wide2utf8(fn):
>         codePage = _CP_UTF8
>         dwFlags = 0
>         lpWideCharStr = fn
>         cchWideChar = len(fn)
>         lpMultiByteStr = None
>         cbMultiByte = 0  # zero requests size
>         lpDefaultChar = None
>         lpUsedDefaultChar = None
>         # get size
>         mbcssize = _wideCharToMultiByte(
>           codePage, dwFlags, lpWideCharStr, cchWideChar, lpMultiByteStr,
>           cbMultiByte, lpDefaultChar, lpUsedDefaultChar)
>         if mbcssize <= 0:
>             raise ctypes.WinError(mbcssize)
>         lpMultiByteStr = ctypes.create_string_buffer(mbcssize)
>         # convert
>         retcode = _wideCharToMultiByte(
>           codePage, dwFlags, lpWideCharStr, cchWideChar, lpMultiByteStr,
>           mbcssize, lpDefaultChar, lpUsedDefaultChar)
>         if retcode <= 0:
>             raise ctypes.WinError(retcode)
>         return lpMultiByteStr.value

Awesome, thank you so much! Glad to see that my code was pretty much in the right direction, but I made some silly and some more fundamental mistakes.


More information about the Tutor mailing list