Python and Cyrillic characters in regular expression
google at mrabarnett.plus.com
Thu Sep 4 19:46:39 CEST 2008
On Sep 4, 3:42 pm, phasma <xpa... at gmail.com> wrote:
> Hi, I'm trying extract all alphabetic characters from string.
> reg = re.compile('(?u)([\w\s]+)', re.UNICODE)
You don't need both (?u) and re.UNICODE: they mean the same thing.
This will actually match letters and whitespace.
> buf = re.match(string)
> But it's doesn't work. If string starts from Cyrillic character, all
> works fine. But if string starts from Latin character, match returns
> only Latin characters.
I'm encoding the Unicode results as UTF-8 in order to print them, but
I'm not having a problem with it otherwise:
# -*- coding: utf-8 -*-
reg = re.compile('(?u)([\w\s]+)')
found = reg.match(u"ya я")
found = reg.match(u"я ya")
More information about the Python-list