[Tutor] setting EOF symbol
Pijus Virketis
pijus@virketis.com
Fri Mar 14 22:41:31 2003
Dear all,
After a while away from Python, I decided to write a little weekend
project: a spider to download all the articles and comments from one
newspaper website (www.lrytas.lt). I successfully opened the url:
import urllib
lr = urllib.urlopen("http://www.lrytas.lt/20030314")
But when it came time to read the html in, there was a problem:
while lr:
print(lr.readline())
Obviously, I will be doing more than just echoing the source, but this
is sufficient to show the issue. The EOF does not seem to be hit, and I
have an infinite loop. The </html> tag just rolls by, and Python
eventually hangs.
I am using Python 2.2.2 on Mac OS X. My hypothesis is that the Mac/Unix
EOF is different from the EOF in the webpage source. So I was wondering
how to change the EOF (it's not a parameter in the readline()
function), or more generally, how to read in the source of the webpage?
Thank you!
Pijus