New subject: Unicode support in getargs.c

5 Jan 2002


      Recently, "Martin v. Loewis"  said:
...
When the discussion of tagging binary strings in source code came up,
I started to look into the standard library which string literals
would have to be tagged as byte strings, and which are really
character strings.
I found that the overwhelming majority of string literals in the
standard Python library really denotes byte strings, if you ignore doc
strings. Sometimes, it isn't obvious that they are binary strings,
hence the smiley.
[leaving only one example in:]
                version = "HTTP/0.9"
                status = "200"
                reason = ""
Protocol elements, thus byte string.
I think you're taking it too far now. I think we should assume that
ASCII survives. If Python runs on an EBCDIC machine (does it?) I
assume that at some point the conversion of EBCDIC<->ASCII is handled
semi-transparently.

Also, as these things are readable they should be treated as such. It
should be possible to do
...
...
...
print u"Funny reply to my "+unicode(version)+u" message"
especially when the "funny reply" bit is in Japanese.
What I would agree with, I think, is if we tag these strings as
"ascii". And that is also what the BDFL pronounced at some point:
Python sourcecode is ASCII, and if you put 8 bit characters in there
you're living dangerously.
Only when octal or hex escapes appear in a sourcecode string can it be
anything other than ascii.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.cwi.nl/~jack        | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm

Re: [Python-Dev] Unicode support in getargs.c

Jack Jansen

Martin v. Loewis

Fredrik Lundh

tags

participants (3)