[Tutor] problems with re module
thomi at imail.net.nz
Sat Nov 15 03:04:19 EST 2003
-----BEGIN PGP SIGNED MESSAGE-----
I'm trying to write a function that searches through a string of plain text,
that may (or may not) contain some tags which look like this:
<Graphics file: pics/PCs/barbar2.jpg>
and replace those tags with docbook markup, which looks like this:
<graphic srccredit="Fix Me!" fileref='pics/PCs/barbar2.jpg' />
I'm using the re module, and a recursive algorithm to find and replace the
offending strings, but I'm getting very weird results... I've tried to nut
this out for the last 3-4 hours, but can't seem to get anywhere with it...
here's the code:
"""This procedure takes a column text as an argument, and returns the same
any illegal characters for XML. It even does a bit of text tidying"""
message = message.replace('\n',' ')
message = message.replace('\t',' ')
m = re.search(r"<Graphics\s+file:\s+",message) #search for the starting tag.
start,end = m.span()
cstart,cend = re.search(r">",message).span()
fname = message[end:cstart - 1]
message = message[:start] + "<graphic srccredit='Fix Me!' fileref='%s' />"
% (fname)+ message[cend:]
There's some really simple reason why this doesn't go, but I can't quite put
my finger on it... There were a whole raft of debugging print statements, but
I removed them for your sanity ;)
What's *meant* to happen:
a string which may contain the offending tags gets passed to the processcol()
function. a few simple cleanup operations are performed (removing newlines
Then, if a bad tag is found, the index where the tag starts is recorded, as
well as where the tag ends. the filename is extracted, and the bad tag is
replaced. Because the regex searching goes from left to right, we now pass
the string to the right of the tag we have just fixed to ourselves - this
means that if there were twobad tags, one after the other, the left hand one
would be fixed first, and then the right hand one.
If no bad tags are found, the message is returned.
Can anyone here help me get this going properly?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the Tutor