Parsing html with Beautifulsoup
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Mon Dec 14 17:39:35 EST 2009
En Mon, 14 Dec 2009 03:58:34 -0300, Johann Spies <jspies at sun.ac.za>
escribió:
> On Sun, Dec 13, 2009 at 07:58:55AM -0300, Gabriel Genellina wrote:
>> cell.findAll(text=True) returns a list of all text nodes inside a
>> <td> cell; I preprocess all \n and in each text node, and
>> join them all. lines is a list of lists (each entry one cell), as
>> expected by the csv module used to write the output file.
>
> I have struggled a bit to find the documentation for (text=True).
> Most of documentation for Beautifulsoup I saw mostly contained some
> examples without explaining what the options do. Thanks for your
> explanation.
See
http://www.crummy.com/software/BeautifulSoup/documentation.html#arg-text
> As far as I can see there was no documentation installed with the
> debian package.
BeautifulSoup is very small - a single .py file, no dependencies. The
whole documentation is contained in the above linked page.
--
Gabriel Genellina
More information about the Python-list
mailing list