[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

Tue Sep 30 19:52:55 CEST 2008

Guido van Rossum schrieb:
> On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>>> How can it *regularly* drive you crazy when "the majority of fie names
>>> [...] encoded correctly" (as you assert above)?
>>
>> Because Office files are a) often named with long, seemingly descriptive
>> filenames, which invariably means umlauts in German, and b) often sent around
>> between systems, creating encoding problems.
> 
> Gotcha.

Which means?

>> Having seen how much controversy returning an invalid Unicode string sparks,
>> and given that it really isn't obvious to the newbie either, I think I now agree
>> that dropping filenames when calling a listdir() that returns Unicode filenames
>> is the best solution. I'm a little uneasy with having one function for both
>> bytes and Unicode return, because that kind of str/unicode mixing I thought we
>> had left behind in 2.x, but of course can live with it.
> 
> Well, the *current* Py3k behavior where it may return a mix of bytes
> and str instances is really messy, and likely to trip up most code
> that doesn't expect it in a way that makes it hard to debug. However
> the *proposed* behavior (returns bytes if the arg was bytes, and
> returns str when the arg was str) is IMO sane, and no different than
> the polymorphism found in len() or many builtin operations.

I agree that everything is better than the current behavior.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.