Simple allowing of HTML elements/attributes?

Robert Brewer fumanchu at amor.org
Tue Feb 17 18:47:11 CET 2004


> [Alan Kennedy]
> >> The optimal solution, IMHO, is to tidy the HTML into XML, 
> and then use
> >> SAX to filter out the stuff you don't want. Here is some code that
> >> does the latter. This should be nice and fast, and use a lot less
> >> memory than object-model based approaches.
> 
> [Robert Brewer]
> > I rolled my own solution to this the other day,
> > relying more on regexes. This might be more usable.
> 
> Not sure what you're saying here Robert. Certainly, a regex based
> solution has the potential to be faster than the XML+SAX technique,
> but is likely to be *much* harder to verify correct and secure
> operation.

Sorry! I meant *your* solution might be more usable, not mine! :)

Thank you *very* much for the complete reply--it'll take me a while to
digest.


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org




More information about the Python-list mailing list