Best Way to extract Numbers from String

Jimbo nilly16 at yahoo.com
Sat Mar 20 23:39:30 CET 2010


On Mar 20, 11:51 pm, Luis M. González <luis... at gmail.com> wrote:
> On Mar 20, 12:04 am, Jimbo <nill... at yahoo.com> wrote:
>
>
>
>
>
> > Hello
>
> > I am trying to grab some numbers from a string containing HTML text.
> > Can you suggest any good functions that I could use to do this? What
> > would be the easiest way to extract the following numbers from this
> > string...
>
> > My String has this layout & I have commented what I want to grab:
> > [CODE] """</th>
> >                                 <td class="last">43.200 </td>
> >                                 <td class="change indicator" nowrap>0.040 </td>
>
> >                                                    <td>43.150 </td> #
> > I need to grab this number only
> >                                 <td>43.200 </td>
> >                                                    <td>43.130 </td> #
> > I need to grab this number only
> >                                 <td>43.290 </td>                                         <td>43.100 </td> # I need to
> > grab this number only
> >                                 <td>7,450,447 </td>
> >                                 <td class="middle"><a
> >                                         href="/asx/markets/optionPrices.do?
> > by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
> > a></td>
> >                                 <td class="middle"><a
> >                                         href="/asx/markets/warrantPrices.do?
> > by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
> > Products</a></td>
> >                                 <td class="middle"><a
> >                                         href="/asx/markets/cfdPrices.do?
> > by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
> >                                 <td class="middle"><a href="http://hfgapps.hubb.com/asxtools/
> > Charts.aspx?
> > TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< img
> > src="/images/chart.gif" border="0" height="15" width="15"></a>
> > </td>
> >                                 <td><a href="/research/announcements/status_notes.htm#XD">XD</a>
> >                                 </td>
> >                                 <td><a href="/asx/statistics/announcements.do?
> > by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
> > </td>
> >                         </tr>"""[/CODE]
>
> You should use BeautifulSoup or perhaps regular expressions.
> Or if you are not very smart, lik me, just try a brute force approach:
>
> >>> for i in s.split('>'):
>
>         for e in i.split():
>                 if '.' in e and e[0].isdigit():
>                         print (e)
>
> 43.200
> 0.040
> 43.150
> 43.200
> 43.130
> 43.290
> 43.100
>
>
>
> - Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

Thanks very much, I'm going to look at regular expressions but that
for your code, it shows me how I can do it iwth standard python :)



More information about the Python-list mailing list