regex with specific list of string
Pablo Ziliani
pablo at decode.com.ar
Wed Sep 26 13:00:09 EDT 2007
Carsten Haese wrote:
> On Wed, 2007-09-26 at 15:42 +0000, james_027 wrote:
>
>> hi,
>>
>> how do I regex that could check on any of the value that match any one
>> of these ... 'jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug',
>> 'sep', 'oct', 'nov', 'dec'
>>
>
> Why regex? You can simply check if the given value is contained in the
> set of allowed values:
>
>
>>>> s = set(['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug',
>>>>
> 'sep', 'oct', 'nov', 'dec'])
>
>>>> 'jan' in s
Also, check calendar for a locale aware (vs hardcoded) version:
>>> import calendar
>>> [calendar.month_abbr[i].lower() for i in range(1,13)]
['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
If you still want to use regexes, you can do something like:
>>> import re
>>> pattern = '(?:%s)' % '|'.join(calendar.month_abbr[1:13])
>>> pattern
'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)'
>>> re.search(pattern, "we are in september", re.IGNORECASE)
<_sre.SRE_Match object at 0xb7ced640>
>>> re.search(pattern, "we are in september", re.IGNORECASE).group()
'sep'
If you want to make sure that the month name begins a word, use the following pattern instead:
>>> pattern = r'(?:\b%s)' % r'|\b'.join(calendar.month_abbr[1:13])
>>> pattern
'(?:\\bJan|\\bFeb|\\bMar|\\bApr|\\bMay|\\bJun|\\bJul|\\bAug|\\bSep|\\bOct|\\bNov|\\bDec)'
If in doubt, Google for "regular expressions in python" or go to http://docs.python.org/lib/module-re.html
Regards,
Pablo
More information about the Python-list
mailing list