[Tutor] Question regarding parsing HTML with BeautifulSoup

Shuai Jiang (Runiteking1) marshall.jiang at gmail.com
Fri Jan 5 02:58:36 CET 2007


Hi,

Wow, thats much more elegant than the idea I thought of.

Thank you very much Kent!

Marshall

On 1/3/07, Kent Johnson <kent37 at tds.net> wrote:
>
> Shuai Jiang (Runiteking1) wrote:
> > Hello,
> >
> > I'm working on a program that need to parse a financial document on the
> > internet
> > using BeautifulSoup. Because of the nature of the information, it is all
> > grouped
> > as a table. I needed to get 3 types of info and have succeeded quite
> > well using
> > BeautifulSoup, but encountered problems on the third one.
> >
> > My question is that is there any easy way to parse an HTML tables column
> > easily using BeautifulSoup. I copied the table here and I need to
> > extract the EPS. The numbers are
> > every sixth one from the  <tr> tag ex 2.27, 1.86, 1.61...
>
> Here is one way, found with a little experimenting at the command prompt:
>
> In [1]: data = '''<table id="INCS" style="width:580px" class="f10y"
> cellspacing="0">
> <snip the rest of your data>
>     ...: </table>'''
> In [3]: from BeautifulSoup import BeautifulSoup as BS
>
> In [4]: soup=BS(data)
>
> In [11]: for tr in soup.table.findAll('tr'):
>     ....:     print tr.contents[11].string
>     ....:
>     ....:
> EPS
> 2.27
>   1.86
> 1.61
>   1.27
> 1.18
>   0.84
> 0.73
>   0.46
> 0.2
>   0.0
>
> Kent
>
>
>


-- 
I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as
equals.
    Sir Winston Churchill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070104/09d685fb/attachment.htm 


More information about the Tutor mailing list