[Python-3000] Unicode and OS strings

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Fri Sep 14 00:31:36 CEST 2007


Dnia 14-09-2007, Pt o godzinie 06:12 +0900, Stephen J. Turnbull
napisał(a):

> This means that a way of handling such points
> is very useful, and as long as there's enough PUA space, the approach
> I suggested can handle all of these various issues.

PUA already has a representation in UTF-8, so this is more incompatible
with UTF-8 than needed, and hijacks characters which might be used (for
example I'm using some PUA ranges for encoding my script, they are being
transported between processes, and I would be upset if some language had
mangled them to something else).

While U+0000 is also representable in UTF-8, it cannot occur in
filenames, program arguments, environment variables etc., and thus
in many contexts it was free. It's not free mostly in file contents,
including stdin/stdout/stderr. Of course my escaping scheme can
preserve \0 too, by escaping it to U+0000 U+0000, but here it's
incompatible with the real UTF-8.

> zsh at least allows you to type ^V^SPC to enter an ASCII NUL character
> on the command line, and to assign a string containing NULs to an
> environment variable.

They may work for its internal commands and process-internal variables.
But there can't be NULs in arguments of program invocation, or in
environment variables which survive execve, because the Unix APIs and
data structures - not just C functions - use NULs to delimit these
strings.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/



More information about the Python-3000 mailing list