Regular Expression for words (with umlauts, without numbers)

MRAB python at mrabarnett.plus.com
Fri May 13 13:34:55 EDT 2011


On 13/05/2011 17:14, Tim Chon wrote:
> Hallo Jens,
>
> In current python re module, you have to do something like:
>
> ((?!\d|_\w)+ which uses the negative look ahead to grab all words except
> integers and underscore. Of course, if you turn on the unicode flag re.U
> or use it inline like, (?u) then this will grab your desired umlauts.
>
> I'd actually recommend, however, that if you have an extra 20 minutes,
> to use Regexp 2.7:
> http://bugs.python.org/issue2636
>
> Its a much needed improvement over F.Lundh's re implementation (from
> 1999!) and its 40% faster. Moreover, you can do exactly what you are
> requesting like so,
>
> (?u)[[:alpha:]]+
>
The latest release is here:

     http://pypi.python.org/pypi/regex



More information about the Python-list mailing list