[issue5604] imp.find_module() mixes UTF8 and MBCS

Mon Mar 30 16:26:41 CEST 2009

New submission from Guido van Rossum <guido at python.org>:

There's a path in imp.find_module that mixes encodings.  The module name
is encoded to char* using UTF-8 by the 's' format passed to
PyArg_ParseTuple().  But the path name is converted (in the loop over
the path in find_module()) to char* using the filesystem encoding.  On
Windows this ends up constructing a char* that mixes MBCS and UTF8 in
one string.

(We discovered this when researching bug 4352, but this is not the cause
of the problem reported there -- the module name in that bug is ASCII.)

Andrew Svetlov is looking into producing a patch.

----------
components: Interpreter Core
messages: 84548
nosy: asvetlov, gvanrossum
priority: normal
severity: normal
stage: needs patch
status: open
title: imp.find_module() mixes UTF8 and MBCS
type: behavior
versions: Python 3.0, Python 3.1

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5604>
_______________________________________