"Soup Strainer" for ElementSoup?

John Nagle nagle at animats.com
Tue Mar 25 00:17:16 EDT 2008


erikcw wrote:
> Hi all,
> 
> I was reading in the Beautiful Soup documentation that you should use
> a "Soup Strainer" object to keep memory usage down.
> 
> Since I'm already using Element Tree elsewhere in the project, I
> figured it would make sense to use ElementSoup to keep the api
> consistent. (and cElementTree should be faster right??).
> 
> I can't seem to figure out how to pass ElementSoup a "soup strainer"
> though.
> 
> Any ideas?
> 
> Also - do I need to use the extract() method with ElementSoup like I
> do with Beautiful Soup to keep garbage collection working?
> 
> Thanks!
> Erik

    I really should get my version of BeautifulSoup merged back into
the mainstream.  I have one that's been modified to use weak pointers
for all "up" and "left" links, which makes the graph cycle free. So
the memory is recovered by reference count update as soon as you
let go of the head of the tree.  That helps with the garbage problem.

    What are you parsing?  If you're parsing well-formed XML,
BeautifulSoup is overkill.  If you're parsing real-world HTML,
ElementTree is too brittle.

					John Nagle



More information about the Python-list mailing list