[issue9291] mimetypes initialization fails on Windows because of non-Latin characters in registry

adamhj report at bugs.python.org
Thu Nov 14 14:26:38 CET 2013


adamhj added the comment:

> The encoding is wrong. We should read the registry using Unicode, or at least use the correct encoding. The correct encoding is the ANSI code page: sys.getfilesystemencoding().

> Can you please try with: default_encoding = sys.getfilesystemencoding() ?

This does not work. In fact it doesn't matter what default_encoding is. The variable ctype, which is returned by _winreg.EnumKey(), is a byte string(b'blahblah'), at least on my computer(win2k3sp2, python 2.7.6). Because the interpreter is asked to encode a byte string, it tries to convert the byte string to unicode string first, by calling decode implicitly with 'ascii' encoding, so the exception UnicodeDecodeError.

the variable ctype, which is read from registry key name, can be decoded correctly with sys.getfilesystemencoding()(which returns 'mbcs'), but in fact what we need is a byte string, so there should be neither encoding nor decoding here.

if there is a case that _winreg.EnumKey() returns unicode string, then a type check should be added before the encode. Or maybe the case is that the return type of _winreg.EnumKey() is different in 2.x and 3.x?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9291>
_______________________________________


More information about the Python-bugs-list mailing list