HTMLParser, htmllib and other questions

Lee Harr missive at frontiernet.net
Tue May 13 11:37:24 EDT 2003


My goal is to take some HTML and change instances of
http://mydomain.com/  to  file:///home/me/foo/

Yes, I know about wget, but I can't seem to get it to
work very well with Zope (too much duplication) and so
I want to take the zopemir.py script (to fetch the
content) and extend it to fix up the HTML.

Now:  In general, does this seem like something
suited to HTMLParser.HTMLParser?  Or maybe to
htmllib.HTMLParser?  Or would I be better off just
using  ''.replace()  ?

I experimented a bit with the 2 parsers, and I can get
either one to modify the required a and img tags, but
once I do, I am not quite sure how to reconstruct the
full lines. ie, if I have:

Here is some text with a <a href="location">link</a>.

I can get the parser to return a tag with the corrected
location, but how do I get it to return the whole corrected
line?

This recipe from the Cookbook seems to point the way,
but I wonder if maybe this is more than I need:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/135005

I am pretty sure I could just go forward with .replace(:o)

Any hints appreciated.





More information about the Python-list mailing list