[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

Tue Sep 30 19:45:55 CEST 2008

On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>> How can it *regularly* drive you crazy when "the majority of fie names
>> [...] encoded correctly" (as you assert above)?
>
> Because Office files are a) often named with long, seemingly descriptive
> filenames, which invariably means umlauts in German, and b) often sent around
> between systems, creating encoding problems.

Gotcha.

> Having seen how much controversy returning an invalid Unicode string sparks,
> and given that it really isn't obvious to the newbie either, I think I now agree
> that dropping filenames when calling a listdir() that returns Unicode filenames
> is the best solution. I'm a little uneasy with having one function for both
> bytes and Unicode return, because that kind of str/unicode mixing I thought we
> had left behind in 2.x, but of course can live with it.

Well, the *current* Py3k behavior where it may return a mix of bytes
and str instances is really messy, and likely to trip up most code
that doesn't expect it in a way that makes it hard to debug. However
the *proposed* behavior (returns bytes if the arg was bytes, and
returns str when the arg was str) is IMO sane, and no different than
the polymorphism found in len() or many builtin operations.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)