syntax from diveintopython
iddwb
iddwb at imap1.asu.edu
Tue Apr 17 22:48:50 EDT 2001
On Tue, 17 Apr 2001, Mark Pilgrim wrote:
> Well, you wouldn't be the first person to tell me that. <0.5 wink>
>
thanks for the expanded reply. However, I still am just not getting
SGMLParser
> For those not familiar with how SGMLParser works, it will call this method
> with an HTML tag ("tag", a string) and the attributes of the tag ("attrs", a
I've tried again with a formulation from Guido's intro to web
programming. Here's the error..
=====================================
Traceback (most recent call last):
File "./html3", line 46, in ?
htmlbuffer.feed(buffer)
File "/usr/local/lib/python1.6/sgmllib.py", line 82, in feed
self.rawdata = self.rawdata + data
TypeError: illegal argument type for built-in operation
===================================
I grabbed the rpm for python 1.6. I'm so new to the language that I
didn't see why 2.x would help. I'm still trying to overcome years of
Rexx. anyway, comments appreciated.
====================================
#!/usr/local/bin/python
# first test to open web pages using urlopen2
import sys
from sgmllib import SGMLParser
class HtmlBody(SGMLParser):
def __init__(self):
self.links = []
self.body = ()
SGMLParser.__init__(self)
def do_body(self, attrs):
for (name, value) in attrs:
if name == "body":
value = value
if value:
self.body = value
if name == "href":
value = cleanlink(value)
if value:
self.links.append(value)
def getlinks(self):
return self.links
def cleanlink(link):
i = string.find(link, '#')
if i >= 0:
link = link[:i]
words = string.split(link)
string.join(words, "")
if __name__ == '__main__':
# print sys.argv[1:]
try:
f = open("dean.html")
except IOError:
print "couldn't open ", sys.argv[1:]
sys.exit(1)
buffer = ""
htmlbuffer = HtmlBody()
buffer = f.readlines()
f.close()
htmlbuffer.feed(buffer)
htmlbuffer.close()
body = htmlbuffer.do_body
links = htmlbuffer.getlinks
print body
# print %s %links
>
> - Suppose the original tag is '<a href="index.html" title="Go to home
> page">'
> - The method will be called with tag='a' and attrs=[('href', 'index.html'),
> ('title', 'Go to home page')]
> - The list comprehension will produce a list of 2 elements: ['
> href="index.html"', ' title="Go to home page"']
> - strattrs will be ' href="index.html" title="Go to home page"'
> - The string appended to self.parts will be '<a href="index.html" title="Go
> to home page">', which is what we want.
>
> Other than using string.join(..., "") instead of "".join(...) -- a topic
> which has been beaten to death recently on this newsgroup and which I
> address explicitly in my book
> (http://diveintopython.org/odbchelper_join.html) -- how would you rewrite
> this?
>
> -M
> You're smart; why haven't you learned Python yet?
> http://diveintopython.org/
> Now in Chinese! http://diveintopython.org/cn/
>
>
>
>
David Bear
College of Public Programs/ASU
More information about the Python-list
mailing list