Regex similar to "^(?u)\w$", but without digits?
metolone+gmane at gmail.com
Mon Apr 13 06:21:27 CEST 2009
"Andreas Pfrengle" <a.pfrengle at gmail.com> wrote in message
news:26d3bec3-8329-4432-a680-05c17f930a6a at 3g2000yqk.googlegroups.com...
> On 12 Apr., 02:31, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
>> "Andreas" <a.pfren... at gmail.com> wrote in message
>> news:f953c845-3660-4bb5-8ba7-00b93989cd20 at b1g2000vbc.googlegroups.com...
>> > Hello,
>> > I'd like to create a regex that captures any unicode character, but
>> > not the underscore and the digits 0-9. "^(?u)\w$" captures them also.
>> > Is there a possibility to restrict an expression like "\w" to "\w
>> > without [0-9_]"?
>> '(?u)[^\W0-9_]' removes 0-9_ from \w.
> Hello Mark,
> haven't tried it yet, but it looks good!
> @John: Sorry for being imprecise, I meant *letters*, not *characters*,
> so requirement 2 fits my needs.
Note that \w matches alphanumeric Unicode characters. If you only want
letters, consider superscripts(¹²³), fractions (¼½¾), and other characters
are also numbers to Unicode. See the unicodedata.category function and
If you only want letters as considered by the Unicode standard, something
this would give you only Unicode letters (it could be optimized to list
ranges of characters):
u'(?u)[' + u''.join(unichr(n) for n in xrange(65536) if
ud.category(unichr(n))=='L') + u']'
Hmm, maybe Python 3.0 with its default Unicode strings needs a regex
extension to specify the Unicode category to match.
More information about the Python-list