url-string transformation: from relative to absolute
cruciatuz
sasoft at gmx.de
Sun Dec 2 13:41:14 EST 2001
it's a little bit embarassing to ask this question, but ... :(
Question:
I got a little script for fetching news from a website (attached)
My problem is that the links in the fetched site are all relative.
example:
<a href="news/2001/3736.html">
i want to prepend:
http://www.pro-linux.de/
not a very complicated thing, i know. but i couldn't figure out HOW.
anybody knows a solution?
--
thx in advance,
Stefan Antoni
-------------- next part --------------
#!/usr/bin/env python
import httplib, string
class plNEWS:
def __init__(self):
pass
def _fetchSiteData(self):
# connecting to server and checking for problems
http = httplib.HTTP('www.pro-linux.de')
http.putrequest('GET', '/')
http.putheader('Accept', 'text/html')
http.putheader('Accept', 'text/plain')
http.endheaders()
httpcode, httpmsg, headers = http.getreply()
"""
print "<<< Server Says: >>>\n\n", httpcode, httpmsg, headers
print "<<< End >>>"
"""
if httpcode != 200:
raise "Server Said %i, this means something is wrong" % httpcode
# actual retrieving the site
site = http.getfile()
siteData = site.read()
site.close()
return siteData
def _parseSiteData(self):
siteData = self._fetchSiteData()
siteLines = siteData.splitlines()
i = 0 # a counter
slice_start = 0
slice_end = 0
for line in siteLines:
i = i+1
if line.strip() == '<font color="#ffffff">NEWS</font>': slice_start = i
elif line.strip() == '<font color="#ffffff">AKTUELL</font>': slice_end = i
# make absolute links from relative links in "mehr"
# is to implement
return siteLines[slice_start+4:slice_end-9]
def _showSite(self):
site = self._parseSiteData()
#print site
for line in site:
print line
def run(self):
self._showSite()
if __name__ == '__main__':
pl = plNEWS()
pl.run()
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20011202/b70286ec/attachment.sig>
More information about the Python-list
mailing list