[Python-bugs-list] [ python-Bugs-476326 ] Unicode and imp.find_module

Mon, 07 Jan 2002 02:43:17 -0800

Bugs item #476326, was opened at 2001-10-30 03:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=476326&group_id=5470

Category: Python Interpreter Core
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Paul Boddie (pboddie)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Unicode and imp.find_module

Initial Comment:
When a Unicode string is passed as the module name to 
imp.find_module, the function fails to import the 
named module even when it exists in the specified 
path, returning the error message "No module 
named ..." as a result.

The problem in Python 2.0 can be traced to line 922 of 
Python/import.c which ensures that any strings 
involved in the find_module function must be standard 
Python strings and not Unicode strings, since it tests 
the type of path components against &PyString_Type 
explicitly.

Interestingly, the __import__ built-in function seems 
to work with Unicode strings. Either way, it would be 
great if this could be documented or even fixed, but I 
don't know what the policy is on Unicode module names 
(even when they only contain ASCII-compatible 
characters).

----------------------------------------------------------------------

>Comment By: Paul Boddie (pboddie)
Date: 2002-01-07 02:43

Message:
Logged In: YES 
user_id=226443

It must have been fixed between Python 2.0 and Python 2.1, 
then, but I can't find any obvious indication of this in 
Python/import.c. The platform probably shouldn't matter in 
this case, but I was using Red Hat Linux 6.1 on Intel.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-05 00:04

Message:
Logged In: YES 
user_id=21627

I cannot reproduce the problem in Python 2.1:

>>> import imp
>>> imp.find_module(u"string")
(<open file '/usr/local/lib/python2.2/string.py', mode 'r'
at 0x816e070>, '/usr/local/lib/python2.2/string.py', ('.py',
'r', 1))

I don't think __import__ should accept non-ASCII names. It
may be reasonable to further restrict import to verify that
the argument is a NAME, in the sense of the Python lexis;
doing so is not important, either.
I cannot see any further problem in this report, so I
suggest to close it as fixed. The test in line 922 only
checks the path, not the module name.

----------------------------------------------------------------------

Comment By: Paul Boddie (pboddie)
Date: 2001-12-03 02:59

Message:
Logged In: YES 
user_id=226443

For my purposes, I just wrapped the module name in a 'str' 
function call. I had Unicode strings because I was using 
text from an XML document and then attempting to use such 
text with the import mechanism.

One issue is whether Python would ever support importing 
from files which have non-ASCII filenames. I can imagine 
that certain operating systems support Unicode filenames, 
for example, but then the Python language probably doesn't 
support such filenames as the basis for module names when 
used with the 'import' statement and other related 
statements.

So, there's a wider issue of text encodings in (C)Python 
scripts as part of the "comprehensive" solution to this 
problem; the easy solution is just to enforce ASCII-only 
module names.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-12-01 15:01

Message:
Logged In: YES 
user_id=38388

I guess Python should not except non-ASCII module names, so conversion of Unicode to ASCII should be 
appropriate.

Would it suffice to only test this in find_module() or do you think that I need to dig deeper into the import 
mechanism ?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=476326&group_id=5470