[Python-bugs-list] [ python-Bugs-476326 ] Unicode in sys.path not supported

noreply@sourceforge.net noreply@sourceforge.net
Thu, 05 Sep 2002 00:26:47 -0700


Bugs item #476326, was opened at 2001-10-30 11:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=476326&group_id=5470

Category: Python Interpreter Core
Group: None
Status: Closed
Resolution: Fixed
Priority: 3
Submitted By: Paul Boddie (pboddie)
>Assigned to: Walter Dörwald (doerwalter)
Summary: Unicode in sys.path not supported

Initial Comment:
When a Unicode string is passed as the module name to 
imp.find_module, the function fails to import the 
named module even when it exists in the specified 
path, returning the error message "No module 
named ..." as a result.

The problem in Python 2.0 can be traced to line 922 of 
Python/import.c which ensures that any strings 
involved in the find_module function must be standard 
Python strings and not Unicode strings, since it tests 
the type of path components against &PyString_Type 
explicitly.

Interestingly, the __import__ built-in function seems 
to work with Unicode strings. Either way, it would be 
great if this could be documented or even fixed, but I 
don't know what the policy is on Unicode module names 
(even when they only contain ASCII-compatible 
characters).

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2002-09-05 07:26

Message:
Logged In: YES 
user_id=38388

No time to check; can you do this, Walter ?
Thanks.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-09-04 19:03

Message:
Logged In: YES 
user_id=89016

import.c 2.207 should have fixed this problem, so I hope we 
can close this bug now.

----------------------------------------------------------------------

Comment By: Paul Boddie (pboddie)
Date: 2002-01-07 13:09

Message:
Logged In: YES 
user_id=226443

My apologies: I should have been clearer in my description. 
Here's a test case for Python 2.1 on Windows which 
demonstrates the problem:

import sys, imp

ascii_dir = "D:\Private\Vaults"
unicode_dir = u"D:\Private\Vaults"

# First test: Unicode sys.path value.

sys.path.append(unicode_dir)
imp.find_module(u"VaultsSearch") # fails
imp.find_module("VaultsSearch") # fails
sys.path.remove(unicode_dir)

# Second test: ASCII sys.path value.

sys.path.append(ascii_dir)
imp.find_module(u"VaultsSearch") # succeeds
imp.find_module("VaultsSearch") # succeeds
sys.path.remove(ascii_dir)

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-01-07 10:55

Message:
Logged In: YES 
user_id=38388

The find_module() code doesn't seem to have changed between 
the releases, so it should work in Python 2.0 as well.

The only parts I see in the source code which require strings 
are the sys.path handling APIs. The optional second argument
to find_module() will also only accept strings. Perhaps that's where
your problem originated ?

Python 2.0 (#1, Jan 19 2001, 17:54:27)
[GCC 2.95.2 19991024 (release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import imp
>>> imp.find_module(u'platform')
(<open file '/home/lemburg/bin/platform.py', mode 'r' at 0x8191a78>, '/home/lemburg/bin/platform.py', ('.py', 'r', 1))
>

Can you give an example which demonstrates the problem ?


----------------------------------------------------------------------

Comment By: Paul Boddie (pboddie)
Date: 2002-01-07 10:43

Message:
Logged In: YES 
user_id=226443

It must have been fixed between Python 2.0 and Python 2.1, 
then, but I can't find any obvious indication of this in 
Python/import.c. The platform probably shouldn't matter in 
this case, but I was using Red Hat Linux 6.1 on Intel.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-05 08:04

Message:
Logged In: YES 
user_id=21627

I cannot reproduce the problem in Python 2.1:

>>> import imp
>>> imp.find_module(u"string")
(<open file '/usr/local/lib/python2.2/string.py', mode 'r'
at 0x816e070>, '/usr/local/lib/python2.2/string.py', ('.py',
'r', 1))

I don't think __import__ should accept non-ASCII names. It
may be reasonable to further restrict import to verify that
the argument is a NAME, in the sense of the Python lexis;
doing so is not important, either.
I cannot see any further problem in this report, so I
suggest to close it as fixed. The test in line 922 only
checks the path, not the module name.

----------------------------------------------------------------------

Comment By: Paul Boddie (pboddie)
Date: 2001-12-03 10:59

Message:
Logged In: YES 
user_id=226443

For my purposes, I just wrapped the module name in a 'str' 
function call. I had Unicode strings because I was using 
text from an XML document and then attempting to use such 
text with the import mechanism.

One issue is whether Python would ever support importing 
from files which have non-ASCII filenames. I can imagine 
that certain operating systems support Unicode filenames, 
for example, but then the Python language probably doesn't 
support such filenames as the basis for module names when 
used with the 'import' statement and other related 
statements.

So, there's a wider issue of text encodings in (C)Python 
scripts as part of the "comprehensive" solution to this 
problem; the easy solution is just to enforce ASCII-only 
module names.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-12-01 23:01

Message:
Logged In: YES 
user_id=38388

I guess Python should not except non-ASCII module names, so conversion of Unicode to ASCII should be 
appropriate.

Would it suffice to only test this in find_module() or do you think that I need to dig deeper into the import 
mechanism ?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=476326&group_id=5470