HTMLparsing abnormal html pages
aahz at panix.com
Sat Mar 17 01:14:11 CET 2001
In article <98pvp1$15t$1 at news.netmar.com>, <asle at spam.com> wrote:
>Considering the small program below. Running it will show that the
>is truncating urls in the HTML page. Now, most of you will probably say that
>the page and in particular the URL's of this page are not valid according to
>the RFC1738 protocol --bad luck. But there must be a work-around for this?
For this specific case, Mark's solution may well work (haven't tested it
myself). But you cannot easily find a generic solution because of all
the different ways to mangle HTML.
--- Aahz <*> (Copyright 2001 by aahz at pobox.com)
Androgynous poly kinky vanilla queer het Pythonista http://www.rahul.net/aahz/
Hugs and backrubs -- I break Rule 6
Three sins: BJ, B&J, B&J
More information about the Python-list