[Tutor] Question regarding parsing HTML with BeautifulSoup

Shuai Jiang (Runiteking1) marshall.jiang at gmail.com
Wed Jan 3 20:39:55 CET 2007


Hello,

I'm working on a program that need to parse a financial document on the
internet
using BeautifulSoup. Because of the nature of the information, it is all
grouped
as a table. I needed to get 3 types of info and have succeeded quite well
using
BeautifulSoup, but encountered problems on the third one.

My question is that is there any easy way to parse an HTML tables column
easily using BeautifulSoup. I copied the table here and I need to extract
the EPS. The numbers are
every sixth one from the  <tr> tag ex 2.27, 1.86, 1.61...

Thanks!

Marshall
<table id="INCS" style="width:580px" class="f10y" cellspacing="0">
    <tr class="r1">
        <td align="left"
style="padding-left:5px;padding-right:5px;vertical-align:bottom">&nbsp;</td>
        <td align="right"
style="padding-left:5px;padding-right:5px;vertical-align:bottom">Sales</td>
        <td align="right"
style="padding-left:5px;padding-right:5px;vertical-align:bottom">EBIT</td>
        <td align="right"
style="padding-left:5px;padding-right:5px;vertical-align:bottom">Depreciation</td>
        <td align="right"
style="padding-left:5px;padding-right:5px;vertical-align:bottom">Total Net
Income</td>
        <td align="right"
style="padding-left:5px;padding-right:5px;vertical-align:bottom">EPS</td>
        <td align="right"
style="padding-left:5px;padding-right:5px;vertical-align:bottom">Tax Rate
(%)</td>
    </tr>

    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">02/06</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">30,848.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1,721.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">456.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1,140.0
</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">2.27</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">33.76</td>
    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">02/05</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">27,433.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1,443.0
</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">459.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">934.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1.86</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">35.27</td>
    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">02/04</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">24,548.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1,296.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">385.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">800.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1.61</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">38.27</td>

    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">03/03</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">20,943.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1,014.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">310.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">622.0</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1.27</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">38.66</td>
    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">03/02</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">17,711.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">926.0</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">245.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">570.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1.18</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">38.44</td>
    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">03/01</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">15,189.0
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">649.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">165.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">401.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">0.84</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">38.21</td>

    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">02/00</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">12,494.02
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">562.57
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">109.54
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">347.07
</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">0.73</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">38.31</td>
    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">02/99</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">10,064.65
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">351.68
</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">73.63</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">216.28
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">0.46</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">38.5</td>
    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">02/98</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">8,337.76
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">133.4</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">71.58</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">81.94</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">0.2</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">38.58</td>

    </tr>
    <tr>
        <td align="left"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">03/97</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">7,770.68
</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">2.87</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">66.84</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">1.75</td>

        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">0.0</td>
        <td align="right"
style=";padding-left:5px;padding-right:5px;vertical-align:bottom">39.05</td>
    </tr>
</table>

-- 
I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as
equals.
    Sir Winston Churchill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070103/eae6be30/attachment-0001.html 


More information about the Tutor mailing list