Absolute TO Relative URLs

Satheesh Babu vsbabu at erols.com
Tue May 8 15:47:48 EDT 2001


Hi,

I got the first part (SGML Parser and getting A, IMG etc) fairly done. The
regex part (sigh, I always get stuck with regex) is driving me insane...

Thanks for the suggestion!

--
v.s.babu
vsbabu at erols.com
http://vsbabu.csoft.net
"Sean 'Shaleh' Perry" <shaleh at valinux.com> wrote in message
news:mailman.989345291.25231.python-list at python.org...
>
> On 08-May-2001 Satheesh Babu wrote:
> > Hi,
> >
> > Let us say I've an HTML document. I would like to write a small Python
> > script that reads this document, goes through the absolute URLS (A HREF,
IMG
> > SRC etc) and replaces them with relative URLs. I can pass a parameter
which
> > specifies the BASE HREF of the document.
> >
> > I'm not sure whether I should proceed with regex nightmare or are there
any
> > easy solutions?
> >
> > Any help/pointers will be greatly appreciated.
> >
>
> More likely a combination of the approaches.  There is a wonderful *ML
parser
> framework derived from SGMLParser.  You would just implement hooks for a,
img,
> src, etc.  Then once you find the item use a regex or string compare.
>





More information about the Python-list mailing list