Problema con le RE....

Xavier Morel xavier.morel at masklinn.net
Mon Jan 9 14:49:59 EST 2006


Alessandro wrote:
> Problema con le RE....
> Ho questa stringa "3 HOURS,  22 MINUTES, and  28 SECONDS" e la devo
> 'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS".
> La cosa mi viene molto con le RE...(inutile la premessa che sono molto
> alle prime armi con RE e Python)
> Qesito perchè se eseguo questo codice
> 
>     >>>>regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)")
>     >>>>print regex.findall("22 MINUTE, 3 HOUR,  AND  28 SECOND")
> ottengo come output:
> 
>     >>>> ['MINUTE', 'HOUR', 'SECOND']
> 
> e non come mi aspettavo:
> 
>     >>>> ['3 MINUTE', '22 HOUR', '28 SECOND']
>     
> Saluti e grazie mille...
> Alessandro
> 
Would probably be slightly easier had you written it in english, but 
basically the issue is the matching group.

A match group is defined by the parenthesis in the regular expression, 
e.g. your match group is "(HOUR|MINUTE|SECOND)", which means that only 
that will be returned by a findall.

You need to include the number as well, and you can use a non-grouping 
match for the time (with (?: ) instead of () ) to prevent dirtying your 
matched groups.

 >>> pattern = re.compile(r"([0-9]+ (?:HOUR|MINUTE|SECOND))")

Other improvements:
* \d is a shortcut for "any digit" and is therefore equivalent to [0-9] 
yet slightly clearer.
* You may use the re.I (or re.IGNORECASE) to match both lower and 
uppercase times
* You can easily handle an optional "s"

Improved regex:

 >>> pattern = re.compile(r"(\d+ (?:hour|minute|second)s?)", re.I)
 >>> pattern.findall("3 HOURS 22 MINUTES 28 SECONDS")
['3 HOURS', '22 MINUTES', '28 SECONDS']
 >>> pattern.findall("1 HOUR 22 MINUTES 28 SECONDS")
['1 HOUR', '22 MINUTES', '28 SECONDS']

If you want to learn more about regular expressions, I suggest you to 
browse and read http://regular-expressions.info/ it's a good source of 
informations, and use the Kodos software which is a quite good Python 
regex debugger.



More information about the Python-list mailing list