url-string transformation: from relative to absolute

cruciatuz sasoft at gmx.de
Sun Dec 2 13:41:14 EST 2001

it's a little bit embarassing to ask this question, but ... :(

I got a little script for fetching news from a website (attached)
My problem is that the links in the fetched site are all relative.
<a href="news/2001/3736.html">

i want to prepend:

not a very complicated thing, i know. but i couldn't figure out HOW.
anybody knows a solution?

thx in advance,
Stefan Antoni
-------------- next part --------------
#!/usr/bin/env python

import httplib, string

class plNEWS:
  def __init__(self):

  def _fetchSiteData(self):

    # connecting to server and checking for problems
    http = httplib.HTTP('www.pro-linux.de')
    http.putrequest('GET', '/')
    http.putheader('Accept', 'text/html')
    http.putheader('Accept', 'text/plain')
    httpcode, httpmsg, headers = http.getreply()

    print "<<< Server Says: >>>\n\n", httpcode, httpmsg, headers
    print "<<< End >>>"

    if httpcode != 200:
      raise "Server Said %i, this means something is wrong" % httpcode

    # actual retrieving the site
    site = http.getfile()
    siteData = site.read()
    return siteData

  def _parseSiteData(self):
    siteData = self._fetchSiteData()
    siteLines = siteData.splitlines()
    i = 0 # a counter
    slice_start = 0
    slice_end = 0
    for line in siteLines:
      i = i+1
      if line.strip() == '<font color="#ffffff">NEWS</font>': slice_start = i
      elif line.strip() == '<font color="#ffffff">AKTUELL</font>': slice_end = i
      # make absolute links from relative links in "mehr"
      # is to implement
    return siteLines[slice_start+4:slice_end-9]

  def _showSite(self):
    site =  self._parseSiteData()
    #print site
    for line in site:
      print line

  def run(self):

if __name__ == '__main__':
  pl = plNEWS()
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20011202/b70286ec/attachment.sig>

More information about the Python-list mailing list