[Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue

Tue Sep 30 02:08:41 CEST 2008

On Mon, Sep 29, 2008 at 5:29 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
>> >>  - listdir(unicode) -> unicode and raise an error on invalid filename
>>
>> I know I keep flipflopping on this one, but the more I think about it
>> the more I believe it is better to drop those names than to raise an
>> exception. Otherwise a "naive" program that happens to use
>> os.listdir() can be rendered completely useless by a single non-UTF-8
>> filename. Consider the use of os.listdir() by the glob module. If I am
>> globbing for *.py, why should the presence of a file named b'\xff'
>> cause it to fail?
>
> It would be hard for a newbie programmer to understand why he's unable to find
> his very important file ("important r?port.doc") using os.listdir(). And yes,
> if your file system is broken, glob(<unicode>) will fail.

Imagine a program that list all files in a dir, as well as their file
size.  If we return bytes we'll print the name wrong.  If we return
lossy unicode we'll be unable to get the size of some files.  If we
return a malformed unicode we'll be unable to print at all (and what
if this is a GUI app?)

The common use cases need unicode, so the best options for them are to
fail outright or skip bad filenames.

The uncommon use cases need bytes, and they could do an explicit lossy
decode for printing, while still keeping the internal file name as
bytes.

Failing outright does have the advantage that the resulting exception
should have a half-decent approximation of the bad filename.  (Thanks
to the recent choices on unicode repr() and having stderr do escapes.)

-- 
Adam Olsen, aka Rhamphoryncus