Absolute TO Relative URLs
Just van Rossum
just at letterror.com
Tue May 8 17:23:24 EDT 2001
Satheesh Babu wrote:
> Let us say I've an HTML document. I would like to write a small Python
> script that reads this document, goes through the absolute URLS (A HREF, IMG
> SRC etc) and replaces them with relative URLs. I can pass a parameter which
> specifies the BASE HREF of the document.
>
> I'm not sure whether I should proceed with regex nightmare or are there any
> easy solutions?
>
> Any help/pointers will be greatly appreciated.
It's indeed pretty easy by deriving from SGMLParser. In case the
absolute-to-relative url part of your question is still interesting:
I was recently looking for exactly that functionality, couldn't find
anything (apart from a sucky implementation in HTMLgen) so I wrote
my own. See below.
Just
import string
def abs2rel(src, dst):
"""Given two absolute paths, return the relative path from src to dst."""
src = string.split(src, "/")
dst = string.split(dst, "/")
for i in range(min(len(src), len(dst))):
if src[i] <> dst[i]:
break
src = src[i:]
dst = dst[i:]
back = len(src) - 1
dst = back * [".."] + dst
return string.join(dst, "/")
if __name__ == "__main__":
print abs2rel("/", "/sadfsdfd/foo.html")
print abs2rel("/first/second/index.html", "/first/second/foo.html")
print abs2rel("/first/second/index.html", "/first/second/third/foo.html")
print abs2rel("/first/second/index.html", "/first/third/foo.html")
print abs2rel("/first/second/index.html", "/bloo/blah/foo.html")
print abs2rel("/first/second/index.html", "/first/second/index.html")
More information about the Python-list
mailing list