[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

"Martin v. Löwis" martin at v.loewis.de
Wed Oct 1 00:21:04 CEST 2008


>> My concern still is that it brings the bytes type into the status of
>> another character string type, which is really bad, and will require
>> further modifications to Python for the lifetime of 3.x.
> 
> I'd like to understand why this is "really bad". I though it was by
> design that the str and bytes types behave pretty similarly. You can
> use both as dict keys.

If they have to behave pretty similarly, they have to be supported in
all APIs that deal with text. For example, people will demand that
printing bytes should just copy them onto the stream (rather than
invoking repr()), and writing them onto a text stream should work the
same way. GUI library should support them, the XML libraries, and so
on.

Where will you stop, and tell people that bytes are just not supposed
to do this or that?

>> This is because applications will then regularly use byte strings for
>> file names on Unix, and regular strings on Windows, and then expect
>> the program to work the same without further modifications.
> 
> It seems that bytes arguments actually *do* work on Windows -- somehow
> they get decoded. (Unless Terry's report was from 2.x.)

To a limited degree - see my other message. Don't try to listdir a
directory with characters outside CP_ACP (it will give you invalid
file names).

> Actually something like that may not be a bad idea. Ian Bicking's
> webob supports similar double APIs for getting the request parameters
> out of a request object; I believe request.GET['x'] is a text object
> and request.GET_str['x'] is the corresponding uninterpreted bytes
> sequence. I would prefer to have os.environb over os.environ[b"PATH"]
> though.

And would you keep them synchronized?

> I assume at some point we can stop and have sufficiently low-level
> interfaces that everyone can agree are in bytes only. Bytes aren't
> going away. How does Java deal with this? Its File class doesn't seem
> to deal in bytes at all. What would its listFiles() method do with
> undecodable filenames?

Apparently (JDK 1.5.0_16, on Linux), it decodes undecodable bytes/byte
sequences as U+FFFD (REPLACEMENT CHARACTER). Opening such a file will
fail with FileNotFoundException.

IOW, Java hasn't solved the problem in the last 10 years. Marcin
Kowalczyk did a more thorough analysis about a year ago in

http://mail.python.org/pipermail/python-3000/2007-September/010450.html

Regards,
Martin




More information about the Python-Dev mailing list