Suggestion for a new regular expression extension

Skip Montanaro skip at pobox.com
Thu Nov 20 12:00:22 EST 2003


    Nicolas> re_adresse = re.compile(r'''
    ... [big, ugly re snipped] ...
    Nicolas> ''',re.X)

    Nicolas> Note for example the many abbreviations (correct or not) ouf
    Nicolas> "boulevard" : BD, BLD, BVD, BOUL, BOULEVARD. For normalisation
    Nicolas> purposes, I need to transform all those forms into the only
    Nicolas> correct abbreviation, BD.

    Nicolas> What would be really, really neat, would be a regular
    Nicolas> expression extension notation that would make the RE engine to
    Nicolas> return an arbitrary string when a substring is matched. 

Why not just use named groups, then pass the match's groupdict() result
through a normalization function?  Here's a trivial example which
"normalizes" some matches by replacing them with the matched strings'
lengths.

    >>> import re
    >>> pat = re.compile('(?P<a>a+)(?P<b>b+)')
    >>> mat = pat.match("aaaaaaaabbb")
    >>> def norm(d):
    ...   d['a'] = len(d['a'])
    ...   d['b'] = len(d['b'])
    ... 
    >>> d = mat.groupdict()
    >>> d   
    {'a': 'aaaaaaaa', 'b': 'bbb'}
    >>> norm(d)
    >>> d
    {'a': 8, 'b': 3}

Skip





More information about the Python-list mailing list