How to Split Chinese Character with backslash representation?
fredrik at pythonware.com
Fri Oct 27 08:19:22 CEST 2006
Wijaya Edward wrote:
> Since there are separator I need to include as delimiter
> Especially for the case like this:
>>>> str = '\xc5\xeb\xc7\xd5\xbc--FOO--BAR'
>>>> field = list(str)
>>>> print field
> ['\xc5', '\xeb', '\xc7', '\xd5', '\xbc', '-', '-', 'F', 'O', 'O', '-', '-', 'B', 'A', 'R']
> What we want as the output is this instead:
> ['\xc5', '\xeb', '\xc7', '\xd5', '\xbc','FOO','BAR]
>>> s = '\xc5\xeb\xc7\xd5\xbc--FOO--BAR'
>>> re.findall("(?i)[a-z]+|[\xA0-\xFF]", s)
'\xd5', '\xbc', 'FOO', 'BAR']
the RE matches either a sequence of latin characters, *or* a single
you may want to adjust the character ranges to match the encoding you're
using, and your definition of non-chinese words.
More information about the Python-list