[Tutor] Help with Parsing HTML files [html/OOP]

Charlie Clark Charlie Clark <charlie@begeistert.org>
Mon, 06 Aug 2001 11:15:33 +0200


>Don't worry too much yet: the documentation to htmllib assumes that you
>already know about OOP style programming, as well as event-driving
>programming.  If both are new topics, then this might take a little while
>to figure out.
>
Basically yes and thanx for the encouragement.
>
>> If do_img is a method of ImagePuller, when is it called and how does it 
>look 
>> for images?  It seems to do this via tag analysis where a tag contains 
>"src" 
>
>
>Yes, there's some analysis being done behind the scenes in sgmllib.  The
>do_img() method actually gets called during the feed()ing process:
>
>    puller.feed()
>
>feed() is something that's defined in sgmllib.  Whenever it encounters a
>new tag, it dynamically tries to call an appropriately named method.  If
>it runs into a P tag, for example, it'll try calling "start_p".  Let's
>take a small look at sgmllib for a second:
Right, simple but effective. Pity the documentation doesn't make this 
clearer.

I think I've also understood you're example of extending the class to deal 
attributes in certain ways. I need to do some thinking on this though.

Once thing I'd like to do but still haven't worked out is mimic what calling 
sgmllib does on its own and get all that intelligence put into a file so that 
I can do my analysis of it.

At the moment I'm copying my source file into my python library and renaming 
it test.html and runnung sgmllib > output.txt. This gives me a nice 
hierarchical model which simply distinguishes between formatting and data.

Because it's easier for me I would like to be able to pass the results of 
sgmllib's work line for line to my own functions. I think I need to use 
sgmllib as I need the tag classifications in the analysis and htmllib does 
away with them. How would I go about this?

Once I've worked out what my functions should do I guess it should be quite 
easy to turn them into methods in my own special class. Easy in theory that 
is. I probably still need a lot of help ;-)

Have a good night's sleep and talk to you later today!

Charlie