Stripping scripts from HTML with regular expressions

Michel Bouwmans mfb.chikazuku at
Thu Apr 10 19:17:52 CEST 2008

Hash: SHA1

Reedick, Andrew wrote:

>> -----Original Message-----
>> From: at [mailto:python-
>> at] On Behalf Of Michel Bouwmans
>> Sent: Wednesday, April 09, 2008 5:44 PM
>> To: python-list at
>> Subject: RE: Stripping scripts from HTML with regular expressions
>> Thanks! That did the trick. :) I was trying to use HTMLParser but that
>> choked on the script-blocks that didn't contain comment-indicators.
>> Guess I
>> can now move on with this script, thank you.
> Soooo.... you asked for help with a regex workaround, but didn't ask for
> help with the original problem, namely HTMLParser?  ;-)
> *****
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential, proprietary, and/or
> privileged material. Any review, retransmission, dissemination or other
> use of, or taking of any action in reliance upon this information by
> persons or entities other than the intended recipient is prohibited. If
> you received this in error, please contact the sender and delete the
> material from all computers. GA625

I don't think HTMLParser was doing anything wrong here. I needed to parse a
HTML document, but it contained script-blocks with document.write's in
them. I only care for the content outside these blocks but HTMLParser will
choke on such a block when it isn't encapsulated with HTML-comment markers
and it tries to parse the contents of the document.write's. ;)

Version: GnuPG v1.4.7 (GNU/Linux)


More information about the Python-list mailing list