[Tutor] regex questions

Albert-Jan Roskam fomcl at yahoo.com
Thu Feb 17 22:19:21 CET 2011


Hello,

I have a couple of regex questions:

1 -- In the code below, how can I match the connecting words 'van de' , 'van 
der', etc. (all quite common in Dutch family names)?
2 -- It is quite hard to make a regex for all surnames, but easier to make 
regexes for the initials and the connecting words. How could I ' subtract'  
those two regexes to end up with something that matches the surnames (I used two 
.replaces() in my code, which roughly work, but I'm thinking there's a re way to 
do it, perhaps with carets (^).
3 -- Suppose I want to yank up my nerd rating by adding a re.NONDIACRITIC flag 
to the re module (matches letters independent of their accents), how would I go 
about? Should I subclass from re and implement the method, using the other 
existing methods as an example? I would find this a very useful addition.

Thanks in advance for your thoughts!

Python 2.7.0+ (r27:82500, Sep 15 2010, 18:04:55) 
[GCC 4.4.5] on linux2

>>> import re
>>> names = ["J. van der Meer", "J. van den Meer", "J. van Meer", "Meer, J. van 
>>>der", "Meer, J. van den", "Meer, J. van de", "Meer, J. van"]
>>> for name in names:
    print re.search("(van? de[nr]?)\b? ?", name, re.IGNORECASE).group(1)

van der
van den
Traceback (most recent call last):
  File "<pyshell#26>", line 2, in <module>
    print re.search("(van? de[nr]?)\b? ?", name, re.IGNORECASE).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

 Cheers!!
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have the 
Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110217/c31759fb/attachment.html>


More information about the Tutor mailing list