[Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Neil Hodgson
nyamatongwe at gmail.com
Mon Jul 4 02:18:15 CEST 2005
Guido van Rossum:
> Then maybe the code that handles Unicode paths in arguments should be
> fixed rather than adding a module that encapsulates a work-around...
It isn't clear whether you are saying this should be fixed by the
user or in the library. For a quick example, say someone wrote some
code for counting lines in a directory:
import os
root = "docs"
lines = 0
for p in os.listdir(root):
lines += len(file(os.path.join(root,p)).readlines())
print lines, "document lines"
Quite common code. Running it now with one file "abc" in the
directory yields correct behaviour:
>pythonw -u "xlines.py"
1 document lines
Now copy the file "Здравствуйте" into the directory and run it again:
>pythonw -u "xlines.py"
Traceback (most recent call last):
File "xlines.py", line 5, in ?
lines += len(file(os.path.join(root,p)).readlines())
IOError: [Errno 2] No such file or directory: 'docs\\????????????'
Changing line 2 to [root = u"docs"] will make the code work. If
this is the correct fix then all file handling code should be written
using unicode names.
Contrast this to using path:
import path
root = "docs"
lines = 0
for p in path.path(root).files():
lines += len(file(p).readlines())
print lines, "document lines"
The obvious code works with only "abc" in the directory and also
when "Здравствуйте" is added.
Now, if you are saying it is a library failure, then there are
multiple ways to fix it.
1) os.listdir should always return unicode. The problem with this
is that people will see breakage of existing scripts because of
promotion issues. Much existing code assumes a fixed locale, often
8859-1 and combining unicode and accented characters will raise
UnicodeDecodeError.
2) os.listdir should not return "???????" garbage, instead
promoting to unicode whenever it sees garbage. This may also lead to
UnicodeDecodeError as in (1).
3) This is an exceptional situation but the exception should be
more explicit and raised earlier when os.listdir first encounters name
garbage.
Neil
More information about the Python-Dev
mailing list