Absolute TO Relative URLs
jdf at pobox.com
Wed May 9 05:51:41 CEST 2001
Just van Rossum <just at letterror.com> writes:
> Satheesh Babu wrote:
> > Let us say I've an HTML document. I would like to write a small
> > Python script that reads this document, goes through the absolute
> > URLS (A HREF, IMG SRC etc) and replaces them with relative URLs. I
> > can pass a parameter which specifies the BASE HREF of the
> > document.
> It's indeed pretty easy by deriving from SGMLParser.
Subclassing SGMLParser (or HTMLParser) I understand; using such a
technique to modify the input document on the fly, I do not. I find
the documentation for the formatter classes to be rather obscure,
though I feel like the proper application of a formatter is probably
just the thing. Can you shed a little light? As a trivial example,
let's say I want to capitalize all of the attributes of my anchor
def start_a(self, attrs):
newattrs = [ (x.upper(), x) for x in attrs ]
# what do I do now?
Jonathan Feinberg jdf at pobox.com Sunny Brooklyn, NY
More information about the Python-list