HTMLParser, htmllib and other questions

Anand Pillai pythonguy at Hotpop.com
Thu May 15 08:31:09 EDT 2003


Hi Lee

     I have written a web-spider program in python and in
it required a class which does a job like what you wants.
Basically the requirement is to map internet urls to directory
files given a base directory.

    Myself and a friend have come up with an implementation
of it. It is free but we have not yet documented or informed
this group about it. You can see the code in my webpage
http://members.fortunecity.com/anandpillai. The code is 
available as a link somewhere in the page. The module name is
WebHttpUrlPath.py.

 Tell me if you find it useful :-)

Best Regards

Anand B Pillai

 

Lee Harr <missive at frontiernet.net> wrote in message news:<slrnbc1md3.bs.missive at localhost.my.domain>...
> My goal is to take some HTML and change instances of
> http://mydomain.com/  to  file:///home/me/foo/
> 
> Yes, I know about wget, but I can't seem to get it to
> work very well with Zope (too much duplication) and so
> I want to take the zopemir.py script (to fetch the
> content) and extend it to fix up the HTML.
> 
> Now:  In general, does this seem like something
> suited to HTMLParser.HTMLParser?  Or maybe to
> htmllib.HTMLParser?  Or would I be better off just
> using  ''.replace()  ?
> 
> I experimented a bit with the 2 parsers, and I can get
> either one to modify the required a and img tags, but
> once I do, I am not quite sure how to reconstruct the
> full lines. ie, if I have:
> 
> Here is some text with a <a href="location">link</a>.
> 
> I can get the parser to return a tag with the corrected
> location, but how do I get it to return the whole corrected
> line?
> 
> This recipe from the Cookbook seems to point the way,
> but I wonder if maybe this is more than I need:
> 
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/135005
> 
> I am pretty sure I could just go forward with .replace(:o)
> 
> Any hints appreciated.




More information about the Python-list mailing list