Regex similar to "^(?u)\w$", but without digits?

Mark Tolonen metolone+gmane at
Mon Apr 13 06:21:27 CEST 2009

"Andreas Pfrengle" <a.pfrengle at> wrote in message 
news:26d3bec3-8329-4432-a680-05c17f930a6a at
> On 12 Apr., 02:31, "Mark Tolonen" <metolone+gm... at> wrote:
>> "Andreas" <a.pfren... at> wrote in message
>> news:f953c845-3660-4bb5-8ba7-00b93989cd20 at
>> > Hello,
>> > I'd like to create a regex that captures any unicode character, but
>> > not the underscore and the digits 0-9. "^(?u)\w$" captures them also.
>> > Is there a possibility to restrict an expression like "\w" to "\w
>> > without [0-9_]"?
>> '(?u)[^\W0-9_]' removes 0-9_ from \w.
>> -Mark
> Hello Mark,
> haven't tried it yet, but it looks good!
> @John: Sorry for being imprecise, I meant *letters*, not *characters*,
> so requirement 2 fits my needs.

Note that \w matches alphanumeric Unicode characters.  If you only want 
letters, consider superscripts(¹²³), fractions (¼½¾), and other characters 
are also numbers to Unicode.  See the unicodedata.category function and

If you only want letters as considered by the Unicode standard, something 
this would give you only Unicode letters (it could be optimized to list 
ranges of characters):

u'(?u)[' + u''.join(unichr(n) for n in xrange(65536) if 
ud.category(unichr(n))[0]=='L') + u']'

Hmm, maybe Python 3.0 with its default Unicode strings needs a regex 
extension to specify the Unicode category to match.


More information about the Python-list mailing list