[Tutor] problems with re module
Thomi Richards
thomi at imail.net.nz
Sat Nov 15 03:04:19 EST 2003
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Guys,
I'm trying to write a function that searches through a string of plain text,
that may (or may not) contain some tags which look like this:
<Graphics file: pics/PCs/barbar2.jpg>
and replace those tags with docbook markup, which looks like this:
<graphic srccredit="Fix Me!" fileref='pics/PCs/barbar2.jpg' />
I'm using the re module, and a recursive algorithm to find and replace the
offending strings, but I'm getting very weird results... I've tried to nut
this out for the last 3-4 hours, but can't seem to get anywhere with it...
here's the code:
- ---------------------------------
def processcol(message):
"""This procedure takes a column text as an argument, and returns the same
text, without
any illegal characters for XML. It even does a bit of text tidying"""
message = message.replace('\n',' ')
message = message.replace('\t',' ')
m = re.search(r"<Graphics\s+file:\s+",message) #search for the starting tag.
if m:
start,end = m.span()
cstart,cend = re.search(r">",message).span()
fname = message[end:cstart - 1]
message = message[:start] + "<graphic srccredit='Fix Me!' fileref='%s' />"
% (fname)+ message[cend:]
return processcol(message[cend:])
else:
return message
- -----------------------------------
There's some really simple reason why this doesn't go, but I can't quite put
my finger on it... There were a whole raft of debugging print statements, but
I removed them for your sanity ;)
What's *meant* to happen:
a string which may contain the offending tags gets passed to the processcol()
function. a few simple cleanup operations are performed (removing newlines
and tabs).
Then, if a bad tag is found, the index where the tag starts is recorded, as
well as where the tag ends. the filename is extracted, and the bad tag is
replaced. Because the regex searching goes from left to right, we now pass
the string to the right of the tag we have just fixed to ourselves - this
means that if there were twobad tags, one after the other, the left hand one
would be fixed first, and then the right hand one.
If no bad tags are found, the message is returned.
Can anyone here help me get this going properly?
- --
Thomi Richards,
http://once.sourceforge.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
iD8DBQE/td4D2tSuYV7JfuERAuFRAJ9p//NL94AWovOw3EBnAaZA1mu7gwCfbqjN
FGl/VfrI/r4Zxe4fmrU7EU8=
=BzZz
-----END PGP SIGNATURE-----
More information about the Tutor
mailing list