[Tutor] List of regular expressions

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Thu Jun 23 01:30:35 CEST 2005



On Wed, 22 Jun 2005, Shidan wrote:

> Hi I have a list of regular expression patterns like such:
>
> thelist = ['^594694.*','^689.*','^241.*',
>    '^241(0[3-9]|1[0145]|2[0-9]|3[0-9]|41|5[1-37]|6[138]|75|8[014579]).*']
> >
> Now I want to iterate thru each of these like:
>
> for pattern in thelist:
>     regex=re.compile(pattern)
>     if regex.match('24110'):
>         the_pattern = pattern
>         .
>         .
>         sys.exit(0)
>
> but in this case it will pick thelist[2] and not the list[3] as I wanted
> to, how can I have it pick the pattern that describes it better from the
> list.


Hi Shidan,

Regular expressions don't have a concept of "better match": a regular
expression either matches a pattern or it doesn't.  It's binary: there's
no concept of the "specificity" of a regular expression match unless you
can define one yourself.

Intuitively, it sounds like you're considering anything that uses a
wildcard to be less match-worthy than something that uses simpler things
like a character set.  Does that sound right to you?  If so, then perhaps
we can write a function that calculates the "specificity" of a regular
expression, so that 'theList[2]' scores less highly than 'theList[3]'.
You can then see which regular expressions match your string, and then
rank them in terms of specificity.

But it's important to realize that what you've asked is actually a
subjective measure of "best match", and so we have to define specifically
what "best" means to us.  (Other people might consider short regular
expressions to be better because they're shorter and easier to read!)


Tell us more about the problem, and we'll do what we can to help.  Best of
wishes!



More information about the Tutor mailing list