LC_ALL and os.listdir()
"Martin v. Löwis"
martin at v.loewis.de
Thu Feb 24 10:12:24 EST 2005
Duncan Booth wrote:
> Windows (when using NTFS) stores all the filenames in unicode, and Python
> uses the unicode api to implement listdir (when given a unicode path). This
> means that the filename never gets encoded to a byte string either by the
> OS or Python. If you use a byte string path than the filename gets encoded
> by Windows and Python just returns what it is given.
Serge's answer is good: you might only want to apply this algorithm to
posixpath. OTOH, in the specific case, it would not have caused problems
if it were applied to ntpath as well: the path was a Unicode string, so
listdir would have returned only Unicode strings (on Windows), and the
code in path.join dealing with mixed string types would not have been
triggered.
Again, I think the algorithm should be this:
- if both are the same kind of string, just concatenate them
- if not, try to coerce the byte string to a Unicode string, using
sys.getfileencoding()
- if that fails, try the other way 'round
- if that fails, let join fail.
The only drawback I can see with that approach is that it would "break"
environments where the system encoding is "undefined", i.e. implicit
string/unicode coercions are turned off. In such an environment, it
is probably desirable that os.path.join performs no coercion as well,
so this might need to get special-cased.
Regards,
Martin
More information about the Python-list
mailing list