re pattern for matching JS/CSS
Tim Chase
python.list at tim.thechases.com
Fri Dec 15 11:52:40 EST 2006
>> I've tried
>> '<script[\S\s]*/script>'
>> but that didn't work properly. I'm fairly basic in my knowledge of
>> Python, so I'm still trying to learn re.
>> What pattern would work?
>
> I use re.compile("<script.*?</script>",re.DOTALL)
> for scripts. I strip this out first since my tag stripping re will
> strip out script tags as well hope this was of help.
This won't catch various alterations of
<
script
>
doEvil()
<
/
script
>
which is valid html/xhtml.
For less valid html, but still attemptable, one might find
something like
<scrip<script>hah</script>t>doEvil()</script>
which, if you nuke your pattern, leaves the valid but unwanted
<script>doEvil()</script>
I'd propose that it's better to use something such as
BeautifulSoup that actually parses the HTML, and then skim
through it whitelisting the tags you plan to allow, and skipping
the emission of any tags that don't make the whitelist.
-tkc
More information about the Python-list
mailing list