Regular expressions, help?

Peter Otten __peter__ at web.de
Thu Apr 19 02:43:54 EDT 2012


Sania wrote:

> So I am trying to get the number of casualties in a text. After 'death
> toll' in the text the number I need is presented as you can see from
> the variable called text. Here is my code
> I'm pretty sure my regex is correct, I think it's the group part
> that's the problem.

No. A regex like ".*(\d+)" is "greedy", the ".*" matches as much as 
possible:

>>> re.match(".*(\d+)", "alpha 123 beta 456 gamma").group(1)
'6'

You want to find the first number and need the non-greedy form ".*?"

>>> re.match(".*?(\d+)", "alpha 123 beta 456 gamma").group(1)
'123'

> I am using nltk by python. Group grabs the string in parenthesis and
> stores it in deadnum and I make deadnum into a list.
> 
>  text="accounts put the death toll at 637 and those missing at
> 653 , but the total number is likely to be much bigger"
>       dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
>       deadnum=dead.group(1)
>       deaths.append(deadnum)
>       print deaths





More information about the Python-list mailing list