Regex help needed!

Rolando Espinoza La Fuente darkrho at gmail.com
Thu Jan 7 23:38:57 CET 2010


# http://gist.github.com/271661

import lxml.html
import re

src = """
lksjdfls <div id ='amazon_345343'> kdjff lsdfs </div> sdjfls <div id
=   "amazon_35343433">sdfsd</div><div id='amazon_8898'>welcome</div>
hello, my age is 86 years old and I was born in 1945. Do you know
that
PI is roughly 3.1443534534534534534 """

regex = re.compile('amazon_(\d+)')

doc = lxml.html.document_fromstring(src)

for div in doc.xpath('//div[starts-with(@id, "amazon_")]'):
    match = regex.match(div.get('id'))
    if match:
        print match.groups()[0]



On Thu, Jan 7, 2010 at 4:42 PM, Aahz <aahz at pythoncraft.com> wrote:
> In article <19de1d6e-5ba9-42b5-9221-ed7246e39b4a at u36g2000prn.googlegroups.com>,
> Oltmans  <rolf.oltmans at gmail.com> wrote:
>>
>>I've written this regex that's kind of working
>>re.findall("\w+\s*\W+amazon_(\d+)",str)
>>
>>but I was just wondering that there might be a better RegEx to do that
>>same thing. Can you kindly suggest a better/improved Regex. Thank you
>>in advance.
>
> 'Some people, when confronted with a problem, think "I know, I'll use
> regular expressions."  Now they have two problems.'
> --Jamie Zawinski
>
> Take the advice other people gave you and use BeautifulSoup.
> --
> Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/
>
> "If you think it's expensive to hire a professional to do the job, wait
> until you hire an amateur."  --Red Adair
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Rolando Espinoza La fuente
www.rolandoespinoza.info



More information about the Python-list mailing list