[Python-Dev] String methods... finally

Guido van Rossum guido at CNRI.Reston.VA.US
Wed Jun 16 01:04:17 CEST 1999


> Is there any sort of agreement that Python will use L"..." to denote
> Unicode strings?  I would be happy with it.

I don't know of any agreement, but it makes sense.

> Also, should:
> print L"foo" -> 'foo'
> and
> print `L"foo"` -> L'foo'

Yes, I think this should be the way.  Exactly what happens to
non-ASCII characters is up to the implementation.

Do we have agreement on escapes like \xDDDD?  Should \uDDDD be added?

The difference between the two is that according to the ANSI C
standard, which I follow rather strictly for string literals,
'\xABCDEF' is a single character whose value is the lower bits
(however many fit in a char) of 0xABCDEF; this makes it cumbersome to
write a string consisting of a hex escape followed by a digit or
letter a-f or A-F; you would have to use another hex escape or split
the literal in two, like this: "\xABCD" "EF".  (This is true for 8-bit
chars as well as for long char in ANSI C.)  The \u escape takes up to
4 bytes but is not ANSI C.  In Java, \u has the additional funny
property that it is recognized *everywhere* in the source code, not
just in string literals, and I believe that this complicates the
interpretation of things like "\\uffff" (is the \uffff interpreted
before regular string \ processing happens?).  I don't think we ought
to copy this behavior, although JPython users or developers might
disagree.  (I don't know anyone who *uses* Unicode strings much, so
it's hard to gauge the importance of these issues.)

--Guido van Rossum (home page: http://www.python.org/~guido/)




More information about the Python-Dev mailing list