Best Way to extract Numbers from String

Luis M. González luismgz at gmail.com
Sat Mar 20 08:51:25 EDT 2010


On Mar 20, 12:04 am, Jimbo <nill... at yahoo.com> wrote:
> Hello
>
> I am trying to grab some numbers from a string containing HTML text.
> Can you suggest any good functions that I could use to do this? What
> would be the easiest way to extract the following numbers from this
> string...
>
> My String has this layout & I have commented what I want to grab:
> [CODE] """</th>
>                                 <td class="last">43.200 </td>
>                                 <td class="change indicator" nowrap>0.040 </td>
>
>                                                    <td>43.150 </td> #
> I need to grab this number only
>                                 <td>43.200 </td>
>                                                    <td>43.130 </td> #
> I need to grab this number only
>                                 <td>43.290 </td>                                         <td>43.100 </td> # I need to
> grab this number only
>                                 <td>7,450,447 </td>
>                                 <td class="middle"><a
>                                         href="/asx/markets/optionPrices.do?
> by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
> a></td>
>                                 <td class="middle"><a
>                                         href="/asx/markets/warrantPrices.do?
> by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
> Products</a></td>
>                                 <td class="middle"><a
>                                         href="/asx/markets/cfdPrices.do?
> by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
>                                 <td class="middle"><a href="http://hfgapps.hubb.com/asxtools/
> Charts.aspx?
> TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< img
> src="/images/chart.gif" border="0" height="15" width="15"></a>
> </td>
>                                 <td><a href="/research/announcements/status_notes.htm#XD">XD</a>
>                                 </td>
>                                 <td><a href="/asx/statistics/announcements.do?
> by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
> </td>
>                         </tr>"""[/CODE]


You should use BeautifulSoup or perhaps regular expressions.
Or if you are not very smart, lik me, just try a brute force approach:

>>> for i in s.split('>'):
	for e in i.split():
		if '.' in e and e[0].isdigit():
			print (e)


43.200
0.040
43.150
43.200
43.130
43.290
43.100
>>>



More information about the Python-list mailing list