LC_ALL and os.listdir()

"Martin v. Löwis" martin at v.loewis.de
Thu Feb 24 10:12:24 EST 2005


Duncan Booth wrote:
> Windows (when using NTFS) stores all the filenames in unicode, and Python 
> uses the unicode api to implement listdir (when given a unicode path). This 
> means that the filename never gets encoded to a byte string either by the 
> OS or Python. If you use a byte string path than the filename gets encoded 
> by Windows and Python just returns what it is given.

Serge's answer is good: you might only want to apply this algorithm to
posixpath. OTOH, in the specific case, it would not have caused problems
if it were applied to ntpath as well: the path was a Unicode string, so
listdir would have returned only Unicode strings (on Windows), and the
code in path.join dealing with mixed string types would not have been
triggered.

Again, I think the algorithm should be this:
- if both are the same kind of string, just concatenate them
- if not, try to coerce the byte string to a Unicode string, using
   sys.getfileencoding()
- if that fails, try the other way 'round
- if that fails, let join fail.

The only drawback I can see with that approach is that it would "break"
environments where the system encoding is "undefined", i.e. implicit
string/unicode coercions are turned off. In such an environment, it
is probably desirable that os.path.join performs no coercion as well,
so this might need to get special-cased.

Regards,
Martin



More information about the Python-list mailing list