Easy reading HTML?

Fredrik Lundh effbot at telia.com
Wed Feb 23 00:58:33 CET 2000


Martin Skøtt <mskott at image.dk> wrote:
> I am currently in the thinking process of writing a little python
> program to sort my Netscape bookmarks file. It is so smart that this
> bookmark file is a simple HTML file which I am now looking for an easy
> way to read.
>
> What I need is a function to parse tables which are used to handle
> folders in the menu and <A HREF ...> tags. In the <A HREF..> I need to
> know the address it points to and its title (which is the one I want
> to sort on).
>
> Do you have any smart ideas you want to share? I guess its htmllib I
> need but I don't know where to start with it.

from the eff-bot guide (see below):

# htmllib-example-1.py

import htmllib
import formatter
import string

class Parser(htmllib.HTMLParser):
    # return a dictionary mapping anchor texts to lists
    # of associated hyperlinks

    def __init__(self, verbose=0):
        self.anchors = {}
        f = formatter.NullFormatter()
        htmllib.HTMLParser.__init__(self, f, verbose)

    def anchor_bgn(self, href, name, type):
        self.save_bgn()
        self.anchor = href

    def anchor_end(self):
        text = string.strip(self.save_end())
        if self.anchor and text:
            self.anchors[text] = self.anchors.get(text, []) + [self.anchor]

file = open("samples/sample.htm")
html = file.read()
file.close()

p = Parser()
p.feed(html)
p.close()

for k, v in p.anchors.items():
    print k, "=>", v

print

## link => ['http://www.python.org']

</F>

<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->





More information about the Python-list mailing list