[Python-Dev] String methods... finally
Guido van Rossum
guido at CNRI.Reston.VA.US
Wed Jun 16 01:04:17 CEST 1999
> Is there any sort of agreement that Python will use L"..." to denote
> Unicode strings? I would be happy with it.
I don't know of any agreement, but it makes sense.
> Also, should:
> print L"foo" -> 'foo'
> print `L"foo"` -> L'foo'
Yes, I think this should be the way. Exactly what happens to
non-ASCII characters is up to the implementation.
Do we have agreement on escapes like \xDDDD? Should \uDDDD be added?
The difference between the two is that according to the ANSI C
standard, which I follow rather strictly for string literals,
'\xABCDEF' is a single character whose value is the lower bits
(however many fit in a char) of 0xABCDEF; this makes it cumbersome to
write a string consisting of a hex escape followed by a digit or
letter a-f or A-F; you would have to use another hex escape or split
the literal in two, like this: "\xABCD" "EF". (This is true for 8-bit
chars as well as for long char in ANSI C.) The \u escape takes up to
4 bytes but is not ANSI C. In Java, \u has the additional funny
property that it is recognized *everywhere* in the source code, not
just in string literals, and I believe that this complicates the
interpretation of things like "\\uffff" (is the \uffff interpreted
before regular string \ processing happens?). I don't think we ought
to copy this behavior, although JPython users or developers might
disagree. (I don't know anyone who *uses* Unicode strings much, so
it's hard to gauge the importance of these issues.)
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev