Regular expressions and non-standard character set

Oleg Broytmann phd at phd.pp.ru
Wed Mar 21 04:41:08 EST 2001


Hi!

   What is the python version you are running? I have this problem with
python 2.0. Do you use 2.1b?

On Tue, 20 Mar 2001, Fredrik Lundh wrote:
> Petri Mikael Kuittinen wrote:
> > I want to match word boundaries using the special sequences \b and \B
> > of regular expressions. They work OK when using the "standard"
> > alphanumeric set [a-zA-Z0-9_]. But I would like them to work with
> > character set which also contains various "national characters"
> > e.g. Е, Д, Ж, Х, И, Э, Я etc. and their uppercase equivalents.
> >
> > Locale doesn't seem to be the proper way to do it
>
> are you sure?
>
> >>> import locale
> >>> locale.setlocale(locale.LC_ALL, "")
> 'Swedish_Sweden.1252'
> >>> import re
> >>> re.findall(r"\b...\b", "spam, egg, bacon, and ЕДЖ")
> ['egg', 'and']
> >>> re.findall(r"(?L)\b...\b", "spam, egg, bacon, and ЕДЖ")
> ['egg', 'and', 'ЕДЖ']

Oleg.
----
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.





More information about the Python-list mailing list