Python and Cyrillic characters in regular expression

phasma xpahos at gmail.com
Fri Sep 5 07:28:14 EDT 2008


string = u"Привет"
(u'\u041f\u0440\u0438\u0432\u0435\u0442',)

string = u"Hi.Привет"
(u'Hi',)

On Sep 4, 9:53 pm, Fredrik Lundh <fred... at pythonware.com> wrote:
> phasma wrote:
> > Hi, I'm trying extract all alphabetic characters from string.
>
> > reg = re.compile('(?u)([\w\s]+)', re.UNICODE)
> > buf = re.match(string)
>
> > But it's doesn't work. If string starts from Cyrillic character, all
> > works fine. But if string starts from Latin character, match returns
> > only Latin characters.
>
> can you provide a few sample strings that show this behaviour?
>
> </F>




More information about the Python-list mailing list