On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl <g.brandl@gmx.net> wrote:
How can it *regularly* drive you crazy when "the majority of fie names [...] encoded correctly" (as you assert above)?
Because Office files are a) often named with long, seemingly descriptive filenames, which invariably means umlauts in German, and b) often sent around between systems, creating encoding problems.
Gotcha.
Having seen how much controversy returning an invalid Unicode string sparks, and given that it really isn't obvious to the newbie either, I think I now agree that dropping filenames when calling a listdir() that returns Unicode filenames is the best solution. I'm a little uneasy with having one function for both bytes and Unicode return, because that kind of str/unicode mixing I thought we had left behind in 2.x, but of course can live with it.
Well, the *current* Py3k behavior where it may return a mix of bytes and str instances is really messy, and likely to trip up most code that doesn't expect it in a way that makes it hard to debug. However the *proposed* behavior (returns bytes if the arg was bytes, and returns str when the arg was str) is IMO sane, and no different than the polymorphism found in len() or many builtin operations. -- --Guido van Rossum (home page: http://www.python.org/~guido/)