Absolute TO Relative URLs

Just van Rossum just at letterror.com
Tue May 8 17:23:24 EDT 2001


Satheesh Babu wrote:

> Let us say I've an HTML document. I would like to write a small Python
> script that reads this document, goes through the absolute URLS (A HREF, IMG
> SRC etc) and replaces them with relative URLs. I can pass a parameter which
> specifies the BASE HREF of the document.
> 
> I'm not sure whether I should proceed with regex nightmare or are there any
> easy solutions?
> 
> Any help/pointers will be greatly appreciated.

It's indeed pretty easy by deriving from SGMLParser. In case the
absolute-to-relative url part of your question is still interesting:
I was recently looking for exactly that functionality, couldn't find
anything (apart from a sucky implementation in HTMLgen) so I wrote
my own. See below.

Just


import string


def abs2rel(src, dst):
    """Given two absolute paths, return the relative path from src to dst."""
    src = string.split(src, "/")
    dst = string.split(dst, "/")
    for i in range(min(len(src), len(dst))):
        if src[i] <> dst[i]:
            break
    src = src[i:]
    dst = dst[i:]
    back = len(src) - 1
    dst = back * [".."] + dst
    return string.join(dst, "/")


if __name__ == "__main__":
    print abs2rel("/", "/sadfsdfd/foo.html")
    print abs2rel("/first/second/index.html", "/first/second/foo.html")
    print abs2rel("/first/second/index.html", "/first/second/third/foo.html")
    print abs2rel("/first/second/index.html", "/first/third/foo.html")
    print abs2rel("/first/second/index.html", "/bloo/blah/foo.html")
    print abs2rel("/first/second/index.html", "/first/second/index.html")



More information about the Python-list mailing list