[Python-Dev] Filename as byte string in python 2.6 or 3.0?

Adam Olsen rhamph at gmail.com
Wed Oct 1 00:04:09 CEST 2008

On Tue, Sep 30, 2008 at 3:43 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
>> The callback would either be an extra argument to all
>> system calls (bad, ugly etc., and why not go with the existing unicode
>> encoding and error flags if we're adding extra args?) or would be
>> global, where I'd be worried that it might interfere with the proper
>> operation of library code that is several abstractions away from
>> whoever installed the callback, not under their control, and not
>> expecting the callback.
>> I suppose I may have totally misunderstood your proposal, but in
>> general I find callbacks unwieldy.
> Not really - later in the email, I actually pointed out that exposing
> the unicode errors flag for the implicit PyUnicode_Decode invocations
> would be enough to enable a callback mechanism.
> However, James's post pointing out that this is a problem that also
> affects environment variables and command line arguments, not just file
> paths completely kills any hope of purely callback based approach - that
> processing needs to "just work" without any additional intervention from
> the application.
> Of the suggestions I've seen so far, I like Marcin's Mono-inspired
> NULL-escape codec idea the best. Since these strings all come from parts
> of the environment where NULLs are not permitted, a simple "'\0' in
> text" check will immediately identify any strings where decoding failed
> (for applications which care about the difference and want to try to do
> better), while applications which don't care will receive perfectly
> valid Python strings that can be passed around and manipulated as if the
> decoding error never happened.

It avoids the technical problems, but it's still magical behaviour
that users have to learn, whereas bytes/unicode polymorphism uses the
distinctions you should already know about.

There's also a problem of how to turn it on.  I'm against
automatically Python changing the filesystem encoding, no matter how
well intentioned.  Better to let the app do that, which is easy and
could be done for all apps (not just python!) if someone defined a
libc encoding of "null-escaped UTF-8".

On the whole I'm only -0 on it (compared to -1 for UTF-8b).

Adam Olsen, aka Rhamphoryncus

More information about the Python-Dev mailing list