[Tutor] Re: problems with re module

Sat Nov 15 09:48:11 EST 2003

>I'm trying to write a function that searches through a string of plain 
>text,
>that may (or may not) contain some tags which look like this:
>
><Graphics file: pics/PCs/barbar2.jpg>
>
>and replace those tags with docbook markup, which looks like this:
>
><graphic srccredit="Fix Me!" fileref='pics/PCs/barbar2.jpg' />
>

Once the task becomes more complex, I usually end up using groups
and moving the substitution out in to a separate function:

import re

def f(matchobj):
    #print matchobj.group(0), matchobj.group(1), matchobj.group(2)
    return '<graphic srccredit="Fix Me!" fileref="%s" />' % 
matchobj.group(2)

def procol(message):
    """This procedure takes a column text as an argument, and returns the 
same text, without
    any illegal characters for XML. It even does a bit of text tidying"""

    message = message.replace('\n',' ')
    message = message.replace('\t',' ')

    msg = re.sub(r"<(Graphics\s+file:\s+)([^>]*)>", f, message)

    return msg

if __name__ == '__main__':
    test_messages = ['<Graphics file: pics/PCs/barbar2.jpg>',
                        'some text before <Graphics file: 
pics/PCs/barbar2.jpg> two tags <Graphics file: pics/PCs/barbar2.jpg> and 
after']

    for message in test_messages:
        print procol(message)

_________________________________________________________________
>From Beethoven to the Rolling Stones, your favorite music is always playing 
on MSN Radio Plus. No ads, no talk. Trial month FREE!  
http://join.msn.com/?page=offers/premiumradio