Problema con le RE....
Xavier Morel
xavier.morel at masklinn.net
Mon Jan 9 14:49:59 EST 2006
Alessandro wrote:
> Problema con le RE....
> Ho questa stringa "3 HOURS, 22 MINUTES, and 28 SECONDS" e la devo
> 'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS".
> La cosa mi viene molto con le RE...(inutile la premessa che sono molto
> alle prime armi con RE e Python)
> Qesito perchè se eseguo questo codice
>
> >>>>regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)")
> >>>>print regex.findall("22 MINUTE, 3 HOUR, AND 28 SECOND")
> ottengo come output:
>
> >>>> ['MINUTE', 'HOUR', 'SECOND']
>
> e non come mi aspettavo:
>
> >>>> ['3 MINUTE', '22 HOUR', '28 SECOND']
>
> Saluti e grazie mille...
> Alessandro
>
Would probably be slightly easier had you written it in english, but
basically the issue is the matching group.
A match group is defined by the parenthesis in the regular expression,
e.g. your match group is "(HOUR|MINUTE|SECOND)", which means that only
that will be returned by a findall.
You need to include the number as well, and you can use a non-grouping
match for the time (with (?: ) instead of () ) to prevent dirtying your
matched groups.
>>> pattern = re.compile(r"([0-9]+ (?:HOUR|MINUTE|SECOND))")
Other improvements:
* \d is a shortcut for "any digit" and is therefore equivalent to [0-9]
yet slightly clearer.
* You may use the re.I (or re.IGNORECASE) to match both lower and
uppercase times
* You can easily handle an optional "s"
Improved regex:
>>> pattern = re.compile(r"(\d+ (?:hour|minute|second)s?)", re.I)
>>> pattern.findall("3 HOURS 22 MINUTES 28 SECONDS")
['3 HOURS', '22 MINUTES', '28 SECONDS']
>>> pattern.findall("1 HOUR 22 MINUTES 28 SECONDS")
['1 HOUR', '22 MINUTES', '28 SECONDS']
If you want to learn more about regular expressions, I suggest you to
browse and read http://regular-expressions.info/ it's a good source of
informations, and use the Kodos software which is a quite good Python
regex debugger.
More information about the Python-list
mailing list