How to Split Chinese Character with backslash representation?
Paul McGuire
ptmcg at austin.rr._bogus_.com
Fri Oct 27 09:32:24 EDT 2006
"Wijaya Edward" <ewijaya at i2r.a-star.edu.sg> wrote in message
news:mailman.1319.1161920633.11739.python-list at python.org...
>
> Hi all,
>
> I was trying to split a string that
> represent chinese characters below:
>
>
>>>> str = '\xc5\xeb\xc7\xd5\xbc'
>>>> print str2,
> ???
>>>> fields2 = split(r'\\',str)
>>>> print fields2,
> ['\xc5\xeb\xc7\xd5\xbc']
>
> But why the split function here doesn't seem
> to do the job for obtaining the desired result:
>
> ['\xc5','\xeb','\xc7','\xd5','\xbc']
>
There are no backslash characters in the string str, so split finds nothing
to split on. I know it looks like there are, but the backslashes shown are
part of the \x escape sequence for defining characters when you can't or
don't want to use plain ASCII characters (such as in your example in which
the characters are all in the range 0x80 to 0xff). Look at this example:
>>> s = "\x40"
>>> print s
@
I defined s using the escaped \x notation, but s does not contain any
backslashes, it contains the '@' character, whose ordinal character value is
64, or 40hex.
Also, str is not the best name for a string variable, since this masks the
built-in str type.
-- Paul
More information about the Python-list
mailing list