HTMLLib.py use
Matthew Cepl
cepl at fpm.cz
Tue May 4 11:30:37 EDT 1999
In article <001501be9586$22c896c0$f29b12c2 at pythonware.com>,
"Fredrik Lundh" <fredrik at pythonware.com> wrote:
> you forgot to pass the HTMLParser class a valid formatter
> object.
OK, not it's much better, there is no error message. But still, there is no
output from the script. I would like to get just description in metatag
DESCRIPTION of given HTML page. Is it possible to do it with htmllib (or
sgmllib to make things simpler and hopefully faster) or I have to cut things
manually by regexp? It is just training in writing simple script for learning
object oriented programming for total beginner in OOP (and all programming as
well) and trying to write Python port of ESR's SiteMap (see
http://metalab.unc.edu/pub/Linux/apps/www/indexing/sitemap-1.9.tar.gz and
http://www.tuxedo.org/~esr/sitemap.html for result). BTW, when I shall need a
content of TITLE element it should be done via start_title() or how?
Thanks
Matthew
-----------------------------------------------------------------------------
-----------------------
from htmllib import HTMLParser
from string import lower
from htmlentitydefs import entitydefs
import sys
import formatter
class WPage(HTMLParser):
def __init__(self, verbose=0):
self.testdata = ""
HTMLParser.__init__(self, formatter.NullFormatter(), verbose)
def do_meta(self, attributes):
data = self.testdata
self.description = ""
if lower(attributes[0][1])=="description":
self.description = str(attributes[1][1])
print self.description
def close(self):
HTMLParser.close(self)
def test(args = None):
try:
f = open('test.htm', 'r')
except IOError, msg:
print file, ":", msg
sys.exit(1)
data = f.read()
x = WPage()
x.feed(data)
print x.description
x.close()
if __name__ == '__main__':
test()
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
More information about the Python-list
mailing list