[Tutor] Re: problems with re module
Lee Harr
missive at hotmail.com
Sat Nov 15 09:48:11 EST 2003
>I'm trying to write a function that searches through a string of plain
>text,
>that may (or may not) contain some tags which look like this:
>
><Graphics file: pics/PCs/barbar2.jpg>
>
>and replace those tags with docbook markup, which looks like this:
>
><graphic srccredit="Fix Me!" fileref='pics/PCs/barbar2.jpg' />
>
Once the task becomes more complex, I usually end up using groups
and moving the substitution out in to a separate function:
import re
def f(matchobj):
#print matchobj.group(0), matchobj.group(1), matchobj.group(2)
return '<graphic srccredit="Fix Me!" fileref="%s" />' %
matchobj.group(2)
def procol(message):
"""This procedure takes a column text as an argument, and returns the
same text, without
any illegal characters for XML. It even does a bit of text tidying"""
message = message.replace('\n',' ')
message = message.replace('\t',' ')
msg = re.sub(r"<(Graphics\s+file:\s+)([^>]*)>", f, message)
return msg
if __name__ == '__main__':
test_messages = ['<Graphics file: pics/PCs/barbar2.jpg>',
'some text before <Graphics file:
pics/PCs/barbar2.jpg> two tags <Graphics file: pics/PCs/barbar2.jpg> and
after']
for message in test_messages:
print procol(message)
_________________________________________________________________
>From Beethoven to the Rolling Stones, your favorite music is always playing
on MSN Radio Plus. No ads, no talk. Trial month FREE!
http://join.msn.com/?page=offers/premiumradio
More information about the Tutor
mailing list