Suggestion for a new regular expression extension
Skip Montanaro
skip at pobox.com
Thu Nov 20 12:00:22 EST 2003
Nicolas> re_adresse = re.compile(r'''
... [big, ugly re snipped] ...
Nicolas> ''',re.X)
Nicolas> Note for example the many abbreviations (correct or not) ouf
Nicolas> "boulevard" : BD, BLD, BVD, BOUL, BOULEVARD. For normalisation
Nicolas> purposes, I need to transform all those forms into the only
Nicolas> correct abbreviation, BD.
Nicolas> What would be really, really neat, would be a regular
Nicolas> expression extension notation that would make the RE engine to
Nicolas> return an arbitrary string when a substring is matched.
Why not just use named groups, then pass the match's groupdict() result
through a normalization function? Here's a trivial example which
"normalizes" some matches by replacing them with the matched strings'
lengths.
>>> import re
>>> pat = re.compile('(?P<a>a+)(?P<b>b+)')
>>> mat = pat.match("aaaaaaaabbb")
>>> def norm(d):
... d['a'] = len(d['a'])
... d['b'] = len(d['b'])
...
>>> d = mat.groupdict()
>>> d
{'a': 'aaaaaaaa', 'b': 'bbb'}
>>> norm(d)
>>> d
{'a': 8, 'b': 3}
Skip
More information about the Python-list
mailing list