[Python-Dev] Unicode support in getargs.c

Jack Jansen jack@oratrix.nl
Sun, 06 Jan 2002 00:18:36 +0100


Recently, "Martin v. Loewis" <martin@v.loewis.de> said:
> When the discussion of tagging binary strings in source code came up,
> I started to look into the standard library which string literals
> would have to be tagged as byte strings, and which are really
> character strings.
> 
> I found that the overwhelming majority of string literals in the
> standard Python library really denotes byte strings, if you ignore doc
> strings. Sometimes, it isn't obvious that they are binary strings,
> hence the smiley.
[leaving only one example in:]
>                 version = "HTTP/0.9"
>                 status = "200"
>                 reason = ""
> 
> Protocol elements, thus byte string.

I think you're taking it too far now. I think we should assume that
ASCII survives. If Python runs on an EBCDIC machine (does it?) I
assume that at some point the conversion of EBCDIC<->ASCII is handled
semi-transparently.

Also, as these things are readable they should be treated as such. It
should be possible to do
>>> print u"Funny reply to my "+unicode(version)+u" message"
especially when the "funny reply" bit is in Japanese.

What I would agree with, I think, is if we tag these strings as
"ascii". And that is also what the BDFL pronounced at some point:
Python sourcecode is ASCII, and if you put 8 bit characters in there
you're living dangerously.
Only when octal or hex escapes appear in a sourcecode string can it be
anything other than ascii.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.cwi.nl/~jack        | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm