[Python-3000] Raw strings containing \u or \U

Thu May 17 07:45:17 CEST 2007

Ron Adam schrieb:
> Guido van Rossum wrote:
>> That would be great! This will automatically turn \u1234 into 6
>> characters, right?
> 
> I'm not exactly clear when the '\uxxxx' characters get converted.  There 
> isn't any conversion done in tokanize.c that I can see.  It's primarily 
> only concerned with finding the beginning and ending of the string at that 
> point.  It looks like everything between the beginning and end is just 
> passed along "as is" and it's translated further later in the chain.

Look at Python/ast.c, which has functions parsestr() and decode_unicode().
The latter calls PyUnicode_DecodeRawUnicodeEscape() which I think is the
function you're looking for.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.