Regex help needed!

F.R. anthra.norell at bluewin.ch
Thu Dec 24 07:35:09 EST 2009



On 21.12.2009 12:38, Oltmans wrote:
> Hello,. everyone.
>
> I've a string that looks something like
> ----
> lksjdfls<div id ='amazon_345343'>  kdjff lsdfs</div>  sdjfls<div id
> =   "amazon_35343433">sdfsd</div><div id='amazon_8898'>welcome</div>
> ----
>
> > From above string I need the digits within the ID attribute. For
> example, required output from above string is
> - 35343433
> - 345343
> - 8898
>
> I've written this regex that's kind of working
> re.findall("\w+\s*\W+amazon_(\d+)",str)
>
> but I was just wondering that there might be a better RegEx to do that
> same thing. Can you kindly suggest a better/improved Regex. Thank you
> in advance.
>    

If you filter in two or even more sequential steps the problem becomes a 
lot simpler, not least because you can
test each step separately:

 >>> r1 = re.compile ('<div id\D*\d+[^>]*')   # Add ignore case and 
variable white space
 >>> r2 = re.compile ('\d+')
 >>> [r2.search (item).group () for item in r1.findall (s) if item]     
# s is your sample
['345343', '35343433', '8898']     # Supposing all ids have digits

Frederic




More information about the Python-list mailing list