On Tue, Sep 30, 2008 at 1:29 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Guido van Rossum wrote:
However the *proposed* behavior (returns bytes if the arg was bytes, and returns str when the arg was str) is IMO sane, and no different than the polymorphism found in len() or many builtin operations.
My concern still is that it brings the bytes type into the status of another character string type, which is really bad, and will require further modifications to Python for the lifetime of 3.x.
I'd like to understand why this is "really bad". I though it was by design that the str and bytes types behave pretty similarly. You can use both as dict keys.
This is because applications will then regularly use byte strings for file names on Unix, and regular strings on Windows, and then expect the program to work the same without further modifications.
It seems that bytes arguments actually *do* work on Windows -- somehow they get decoded. (Unless Terry's report was from 2.x.)
The next question then will be environment variables and command line arguments, for which we then should provide two versions (e.g. sys.argv and sys.argvb; for os.environ, os.environ["PATH"] could mean something different from os.environ[b"PATH"]).
Actually something like that may not be a bad idea. Ian Bicking's webob supports similar double APIs for getting the request parameters out of a request object; I believe request.GET['x'] is a text object and request.GET_str['x'] is the corresponding uninterpreted bytes sequence. I would prefer to have os.environb over os.environ[b"PATH"] though.
And so on (passwd/group file, Tkinter, ...)
I assume at some point we can stop and have sufficiently low-level interfaces that everyone can agree are in bytes only. Bytes aren't going away. How does Java deal with this? Its File class doesn't seem to deal in bytes at all. What would its listFiles() method do with undecodable filenames? -- --Guido van Rossum (home page: http://www.python.org/~guido/)