Python Regex Question

Thu Sep 20 19:52:16 EDT 2007

On Sep 20, 4:12 pm, Tobiah <t... at tobiah.org> wrote:
> joemystery... at gmail.com wrote:
> > I need to extract the number on each <td tags from a html file.
>
> > i.e 49.950 from the following:
>
> > <td align=right width=80><font size=2 face="New Times
> > Roman,Times,Serif"> 49.950 </font></td>
>
> > The actual number between:  49.950  can be any number of
> > digits before decimal and after decimal.
>
> > <td align=right width=80><font size=2 face="New Times
> > Roman,Times,Serif"> ######.#### </font></td>
>
> > How can I just extract the real/integer number using regex?
>
> '[0-9]*\.[0-9]*'
>
> --
> Posted via a free Usenet account fromhttp://www.teranews.com

I am trying to use BeautifulSoup:

    soup = BeautifulSoup(page)

    td_tags = soup.findAll('td')
    i=0
    for td in td_tags:
        i = i+1
        print "td: ", td
        # re.search('[0-9]*\.[0-9]*', td)
        price = re.compile('[0-9]*\.[0-9]*').search(td)

I am getting an error:

           price= re.compile('[0-9]*\.[0-9]*').search(td)
TypeError: expected string or buffer

Does beautiful soup returns array of objects? If so, how do I pass
"td" instance as string to re.search?  What is the different between
re.search vs re.compile().search?