Regular expressions, help?
Peter Otten
__peter__ at web.de
Thu Apr 19 02:41:15 EDT 2012
Sania wrote:
> Hi,
> So I am trying to get the number of casualties in a text. After 'death
> toll' in the text the number I need is presented as you can see from
> the variable called text. Here is my code
> I'm pretty sure my regex is correct, I think it's the group part
> that's the problem.
No. A regex like ".*(\d+)" is "greedy", the ".*" matches as much as
possible:
>>> re.match(".*(\d+)", "alpha 123 beta 456 gamma").group(1)
'6'
You want to find the first number and need the non-greedy form ".*?"
>>> re.match(".*?(\d+)", "alpha 123 beta 456 gamma").group(1)
'123'
> I am using nltk by python. Group grabs the string in parenthesis and
> stores it in deadnum and I make deadnum into a list.
>
> text="accounts put the death toll at 637 and those missing at
> 653 , but the total number is likely to be much bigger"
> dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
> deadnum=dead.group(1)
> deaths.append(deadnum)
> print deaths
>
> Any help would be appreciated,
> Thank you,
> Sania
More information about the Python-list
mailing list