<div>Hallo Jens,<br></div><div><br></div><div>In current python re module, you have to do something like:</div><div><br></div><div>((?!\d|_\w)+ which uses the negative look ahead to grab all words except integers and underscore. Of course, if you turn on the unicode flag re.U or use it inline like, (?u) then this will grab your desired umlauts.</div>
<div><br></div><div>I'd actually recommend, however, that if you have an extra 20 minutes, to use Regexp 2.7:</div><div><a href="http://bugs.python.org/issue2636">http://bugs.python.org/issue2636</a></div><div><br></div>
<div>Its a much needed improvement over F.Lundh's re implementation (from 1999!) and its 40% faster. Moreover, you can do exactly what you are requesting like so,</div><div><br></div><div>(?u)[[:alpha:]]+</div><div><br>
</div><div>cheers,</div><div>--tim</div><br><div class="gmail_quote">On Fri, May 13, 2011 at 9:01 AM, Jens Lechtenboerger <span dir="ltr"><<a href="mailto:lechten@helios.uni-muenster.de">lechten@helios.uni-muenster.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Dear experts,<br>
<br>
I'm looking for a regular expression to recognize natural language<br>
words with umlauts but without numbers. While \w with re.U does<br>
recognize words with umlauts, it also matches numbers, which I do<br>
not want.<br>
<br>
Is there a better way than an exhaustive enumeration such as<br>
[-a-zàáâãäåæ...]?<br>
<br>
I guess there should be a better way as \w appears to know about<br>
alphabetical characters...<br>
<br>
Thanks in advance<br>
Jens<br>
<font color="#888888">--<br>
<a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br>
</font></blockquote></div><br>