"Soup Strainer" for ElementSoup?
fredrik at pythonware.com
Sun Mar 30 18:45:51 CEST 2008
> I'm parsing real-world HTML with BeautifulSoup and XML with
> I'm guessing that the only benefit to using ElementSoup is that I'll
> have one less API to keep track of, right? Or are there memory
> benefits in converting the Soup object to an ElementTree?
It's purely an API thing: ElementSoup loads the entire HTML file with
BeautifulSoup, and then uses the resulting BS data structure to build an
The ET tree doesn't contain cycles, though, so you can safely pull out
the strings you need from ET and throw away the rest of the tree.
> Any idea about using a Soup Strainer with ElementSoup?
The strainer is used when parsing the file, to control what goes into
the BS tree; to add straining support to ES, you could e.g. add a
parseOnlyThese option that's passed through to BS.
More information about the Python-list