[Tutor] regex questions
Albert-Jan Roskam
fomcl at yahoo.com
Thu Feb 17 22:19:21 CET 2011
Hello,
I have a couple of regex questions:
1 -- In the code below, how can I match the connecting words 'van de' , 'van
der', etc. (all quite common in Dutch family names)?
2 -- It is quite hard to make a regex for all surnames, but easier to make
regexes for the initials and the connecting words. How could I ' subtract'
those two regexes to end up with something that matches the surnames (I used two
.replaces() in my code, which roughly work, but I'm thinking there's a re way to
do it, perhaps with carets (^).
3 -- Suppose I want to yank up my nerd rating by adding a re.NONDIACRITIC flag
to the re module (matches letters independent of their accents), how would I go
about? Should I subclass from re and implement the method, using the other
existing methods as an example? I would find this a very useful addition.
Thanks in advance for your thoughts!
Python 2.7.0+ (r27:82500, Sep 15 2010, 18:04:55)
[GCC 4.4.5] on linux2
>>> import re
>>> names = ["J. van der Meer", "J. van den Meer", "J. van Meer", "Meer, J. van
>>>der", "Meer, J. van den", "Meer, J. van de", "Meer, J. van"]
>>> for name in names:
print re.search("(van? de[nr]?)\b? ?", name, re.IGNORECASE).group(1)
van der
van den
Traceback (most recent call last):
File "<pyshell#26>", line 2, in <module>
print re.search("(van? de[nr]?)\b? ?", name, re.IGNORECASE).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
Cheers!!
Albert-Jan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public
order, irrigation, roads, a fresh water system, and public health, what have the
Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110217/c31759fb/attachment.html>
More information about the Tutor
mailing list