url-string transformation: from relative to absolute

cruciatuz sasoft at gmx.de
Sun Dec 2 13:41:14 EST 2001


it's a little bit embarassing to ask this question, but ... :(

Question:
I got a little script for fetching news from a website (attached)
My problem is that the links in the fetched site are all relative.
example:
<a href="news/2001/3736.html">

i want to prepend:
http://www.pro-linux.de/

not a very complicated thing, i know. but i couldn't figure out HOW.
anybody knows a solution?

--
thx in advance,
Stefan Antoni
-------------- next part --------------
#!/usr/bin/env python

import httplib, string


class plNEWS:
  
  def __init__(self):
    pass

  def _fetchSiteData(self):

    # connecting to server and checking for problems
    http = httplib.HTTP('www.pro-linux.de')
    http.putrequest('GET', '/')
    http.putheader('Accept', 'text/html')
    http.putheader('Accept', 'text/plain')
    http.endheaders()
    httpcode, httpmsg, headers = http.getreply()

    """
    print "<<< Server Says: >>>\n\n", httpcode, httpmsg, headers
    print "<<< End >>>"
    """

    if httpcode != 200:
      raise "Server Said %i, this means something is wrong" % httpcode

    # actual retrieving the site
    site = http.getfile()
    siteData = site.read()
    site.close()
    return siteData

  def _parseSiteData(self):
    siteData = self._fetchSiteData()
    siteLines = siteData.splitlines()
    i = 0 # a counter
    slice_start = 0
    slice_end = 0
    for line in siteLines:
      i = i+1
      if line.strip() == '<font color="#ffffff">NEWS</font>': slice_start = i
      elif line.strip() == '<font color="#ffffff">AKTUELL</font>': slice_end = i
      # make absolute links from relative links in "mehr"
      # is to implement
    return siteLines[slice_start+4:slice_end-9]

  def _showSite(self):
    site =  self._parseSiteData()
    #print site
    for line in site:
      print line

  def run(self):
    self._showSite()


if __name__ == '__main__':
  
  pl = plNEWS()
  pl.run()
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20011202/b70286ec/attachment.sig>


More information about the Python-list mailing list