Python and Cyrillic characters in regular expression
google at mrabarnett.plus.com
Fri Sep 5 16:28:12 CEST 2008
On Sep 5, 12:28 pm, phasma <xpa... at gmail.com> wrote:
> string = u"ðÒÉ×ÅÔ"
All the characters are letters.
> string = u"Hi.ðÒÉ×ÅÔ"
The third character isn't a letter and isn't whitespace.
> On Sep 4, 9:53špm, Fredrik Lundh <fred... at pythonware.com> wrote:
> > phasma wrote:
> > > Hi, I'm trying extract all alphabetic characters from string.
> > > reg = re.compile('(?u)([\w\s]+)', re.UNICODE)
> > > buf = re.match(string)
> > > But it's doesn't work. If string starts from Cyrillic character, all
> > > works fine. But if string starts from Latin character, match returns
> > > only Latin characters.
> > can you provide a few sample strings that show this behaviour?
More information about the Python-list