How to Split Chinese Character with backslash representation?

Fredrik Lundh fredrik at pythonware.com
Fri Oct 27 08:19:22 CEST 2006


Wijaya Edward wrote:

> Since there are separator I need to include as delimiter
> Especially for the case like this:
> 
>>>> str = '\xc5\xeb\xc7\xd5\xbc--FOO--BAR'
>>>> field = list(str)
>>>> print field
> ['\xc5', '\xeb', '\xc7', '\xd5', '\xbc', '-', '-', 'F', 'O', 'O', '-', '-', 'B', 'A', 'R']
> 
> What we want as the output is this instead:
> ['\xc5', '\xeb', '\xc7', '\xd5', '\xbc','FOO','BAR]

 >>> s = '\xc5\xeb\xc7\xd5\xbc--FOO--BAR'
 >>> re.findall("(?i)[a-z]+|[\xA0-\xFF]", s)
'\xd5', '\xbc', 'FOO', 'BAR']

the RE matches either a sequence of latin characters, *or* a single 
non-ASCII character.

you may want to adjust the character ranges to match the encoding you're 
using, and your definition of non-chinese words.

</F>




More information about the Python-list mailing list