[Tutor] setting EOF symbol

Pijus Virketis pijus@virketis.com
Fri Mar 14 22:41:31 2003


Dear all,

After a while away from Python, I decided to write a little weekend 
project: a spider to download all the articles and comments from one 
newspaper website (www.lrytas.lt). I successfully opened the url:

import urllib
lr = urllib.urlopen("http://www.lrytas.lt/20030314")

But when it came time to read the html in, there was a problem:

while lr:
	print(lr.readline())

Obviously, I will be doing more than just echoing the source, but this 
is sufficient to show the issue. The EOF does not seem to be hit, and I 
have an infinite loop. The </html> tag just rolls by, and Python 
eventually hangs.

I am using Python 2.2.2 on Mac OS X. My hypothesis is that the Mac/Unix 
EOF is different from the EOF in the webpage source. So I was wondering 
how to change the EOF (it's not a parameter in the readline() 
function), or more generally, how to read in the source of the webpage?

Thank you!

Pijus