HTMLparsing abnormal html pages

Aahz Maruch aahz at
Sat Mar 17 01:14:11 CET 2001

In article <98pvp1$15t$1 at>,  <asle at> wrote:
>Considering the small program below. Running it will show that the
>is truncating urls in the HTML page. Now, most of you will probably say that
>the page and in particular the URL's of this page are not valid according to
>the RFC1738 protocol --bad luck. But there must be a work-around for this?

For this specific case, Mark's solution may well work (haven't tested it
myself).  But you cannot easily find a generic solution because of all
the different ways to mangle HTML.
                      --- Aahz  <*>  (Copyright 2001 by aahz at

Androgynous poly kinky vanilla queer het Pythonista
Hugs and backrubs -- I break Rule 6

Three sins: BJ, B&J, B&J

More information about the Python-list mailing list