Regex help needed!

Rolando Espinoza La Fuente darkrho at
Thu Jan 7 23:38:57 CET 2010


import lxml.html
import re

src = """
lksjdfls <div id ='amazon_345343'> kdjff lsdfs </div> sdjfls <div id
=   "amazon_35343433">sdfsd</div><div id='amazon_8898'>welcome</div>
hello, my age is 86 years old and I was born in 1945. Do you know
PI is roughly 3.1443534534534534534 """

regex = re.compile('amazon_(\d+)')

doc = lxml.html.document_fromstring(src)

for div in doc.xpath('//div[starts-with(@id, "amazon_")]'):
    match = regex.match(div.get('id'))
    if match:
        print match.groups()[0]

On Thu, Jan 7, 2010 at 4:42 PM, Aahz <aahz at> wrote:
> In article <19de1d6e-5ba9-42b5-9221-ed7246e39b4a at>,
> Oltmans  <rolf.oltmans at> wrote:
>>I've written this regex that's kind of working
>>but I was just wondering that there might be a better RegEx to do that
>>same thing. Can you kindly suggest a better/improved Regex. Thank you
>>in advance.
> 'Some people, when confronted with a problem, think "I know, I'll use
> regular expressions."  Now they have two problems.'
> --Jamie Zawinski
> Take the advice other people gave you and use BeautifulSoup.
> --
> Aahz (aahz at           <*>
> "If you think it's expensive to hire a professional to do the job, wait
> until you hire an amateur."  --Red Adair
> --

Rolando Espinoza La fuente

More information about the Python-list mailing list