On 02:39 pm, hrvoje.niksic@avl.com wrote:
For example, implementing os.listdir to return the file names as Unicode subclasses with ability to access the underlying bytes (automatically recognized by open and friends) sounds like a good compromise that allows the word processor to both have the cake and eat it.
It really seems like the strategy of the current patch (which I believe Guido proposed) makes the most sense. Programs pass different arguments for different things: listdir(text) -> I am thinking in unicode and I do not know about encodings, please give me only things that are proper unicode, because I don't want to deal with that. listdir(bytes) -> I am thinking about bytes, I know about encodings. Just give me filenames as bytes and I will decode them myself or do other fancy things. You can argue about whether this should really be 'listdiru' or 'globu' for explicitness, but when such a simple strategy with unambiguous types works, there's no reason to introduce some weird hybrid bytes/text type that will inevitably be a bug attractor. Python's path abstractions have never been particularly high level, nor do I think they necessarily should be - at least, not until there's some community consensus about what a "high level path abstraction" really looks like. We're still wrestling with it in Twisted, and I can think of at least three ways that ours is wrong. And ours is the one that's doing the best, as far as I can tell :). This proposal gives higher level software the information that it needs to construct appropriate paths. The one thing it doesn't do is expose the decoding rules for the higher- level applications to deal with. I am pretty sure I don't understand how the interaction between filesystem encoding and user locale works in that case, though, so I can't immediately recommend a way to do it.