os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Nov 17 13:11:58 CET 2006

In <4cefe$455d8f47$59ad1aca$3993 at news.flashnewsgroups.com>, gabor wrote:

> Marc 'BlackJack' Rintsch wrote:
>> In <mailman.294.1163721712.32031.python-list at python.org>, Jean-Paul
>> Calderone wrote:
>>>> How would you propose listdir should behave?
>>> Umm, just a wild guess, but how about raising an exception which includes
>>> the name of the file which could not be decoded?
>> Suppose you have a directory with just some files having a name that can't
>> be decoded with the file system encoding.  So `listdir()` fails at this
>> point and raises an exception.  How would you get the names then? Even the
>> ones that *can* be decoded?  This doesn't look very nice:
>> path = u'some path'
>> try:
>>     files = os.listdir(path)
>> except UnicodeError, e:
>>     files = os.listdir(path.encode(sys.getfilesystemencoding()))
>>     # Decode and filter the list "manually" here.
> i agree that it does not look very nice.
> but does this look nicer? :)
> path = u'some path'
> files = os.listdir(path)
> def check_and_fix_wrong_filename(file):
> 	if isinstance(file,unicode):
> 		return file
> 	else:
> 		#somehow convert it to unicode, and return it
> files = [check_and_fix_wrong_filename(f) for f in files]

I think this is very "special" code as you can't use the fixed names to
open the files anymore unless you guess the encoding correctly.  I think
it's a bit fragile.  Wouldn't it be a better solution to convert the
`path` to the file system encoding for getting the file names.  This way
you can use all the names to process the files.

> in other words, your opinion is that the proposed solution is not 
> optimal, or that the current behavior is fine?

I think the current behavior is okay but should be documented.

Maybe I just didn't had enough use cases yet that needed the names as
unicode objects and from my linux file systems experience file names are
just byte strings with two limitations: no slashes and no zero bytes.  :-)

	Marc 'BlackJack' Rintsch

More information about the Python-list mailing list