Best Way to extract Numbers from String

Someone Something fordhaivat at gmail.com
Sat Mar 20 20:51:44 EDT 2010


Its an extremely bad idea to use regex for HTML. You want to change one tiny
little thing and you have to write the regex all over again. if its a
throwaway script, then go ahead.
2010/3/20 Luis M. González <luismgz at gmail.com>

> On Mar 20, 12:04 am, Jimbo <nill... at yahoo.com> wrote:
> > Hello
> >
> > I am trying to grab some numbers from a string containing HTML text.
> > Can you suggest any good functions that I could use to do this? What
> > would be the easiest way to extract the following numbers from this
> > string...
> >
> > My String has this layout & I have commented what I want to grab:
> > [CODE] """</th>
> >                                 <td class="last">43.200 </td>
> >                                 <td class="change indicator" nowrap>0.040
> </td>
> >
> >                                                    <td>43.150 </td> #
> > I need to grab this number only
> >                                 <td>43.200 </td>
> >                                                    <td>43.130 </td> #
> > I need to grab this number only
> >                                 <td>43.290 </td>
>                 <td>43.100 </td> # I need to
> > grab this number only
> >                                 <td>7,450,447 </td>
> >                                 <td class="middle"><a
> >
> href="/asx/markets/optionPrices.do?
> > by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
> > a></td>
> >                                 <td class="middle"><a
> >
> href="/asx/markets/warrantPrices.do?
> > by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
> > Products</a></td>
> >                                 <td class="middle"><a
> >                                         href="/asx/markets/cfdPrices.do?
> > by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
> >                                 <td class="middle"><a href="
> http://hfgapps.hubb.com/asxtools/
> > Charts.aspx?
> >
> TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP"><
> img
> > src="/images/chart.gif" border="0" height="15" width="15"></a>
> > </td>
> >                                 <td><a
> href="/research/announcements/status_notes.htm#XD">XD</a>
> >                                 </td>
> >                                 <td><a
> href="/asx/statistics/announcements.do?
> > by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
> > </td>
> >                         </tr>"""[/CODE]
>
>
> You should use BeautifulSoup or perhaps regular expressions.
> Or if you are not very smart, lik me, just try a brute force approach:
>
> >>> for i in s.split('>'):
>        for e in i.split():
>                if '.' in e and e[0].isdigit():
>                        print (e)
>
>
> 43.200
> 0.040
> 43.150
> 43.200
> 43.130
> 43.290
> 43.100
> >>>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100320/3d20928a/attachment-0001.html>


More information about the Python-list mailing list