Regular expressions

David C. Fox davidcfox at post.harvard.edu
Tue Oct 28 15:47:10 EST 2003


Dennis Reinhardt wrote:

>>That's not what I want.  If I have yipee as "kdasfjh", then I want to
>>compare to all combinations, so that there will be a match when it
> 
> compares
> 
>>it with say when line is "fas" because line has all those letters in
> 
> yipee.
> 
>>But "fass" will not match because yipee does not have two "s"'s.
> 
> 
> Were you to alphabetize both your regex and line, then the question is can
> "afs" or "afss" match the pattern "adfhjks".  By manipulating the lines
> strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
> instance matches the regex and the second does not.
> 
> Notice that the regex must be a string (not compiled) *and* this trick runs
> the regex backwards, matching the line (with embedded ".*") against the
> pattern and not the normal direction.  There may be more straightforward
> ways to do this but this is the regex solution which occurs to me.
> 
> On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
> "fas" but not "fass".

Dennis is correct: alphabetizing both the pattern and the target strings 
is way to do this.  I would use a slightly different regex, though, 
constructed as follows:

import re

def char_counts(s):
     """returns a dictionary indicating how many times a given character
     appears in the string s
     """
     d = {}
     for char in s:
         d[char] = d.get(char, 0) + 1 # current count, or zero, plus one
     return d

def char_subset(s):
     """given a string s, returns a regular expression which matches a
     sorted character string containing a subset of the same characters
     (and with no more occurences of each character than in the original
     string)
     """
     counts = char_counts(s)
     l = []
     chars = counts.keys()
     chars.sort()
     for char in chars:
         r = '%s{0,%d}' % (char, counts[char])
         l.append(r)
         # regex fragment matching count or fewer occurances os that many
     l.append('$') # make sure that we match against the full string
     return ''.join(l)

def sorted_string(s):
     """given a string s, return a new one which all the characters
     sorted
     """
     l = list(s)
     l.sort()
     return ''.join(l)

Then, you can compare a given target string, t, against the string in 
yipee with:

r = char_subset(yipee)
t_sorted = sorted_string(t)
if re.match(r, t_sorted):
     print 'found a match: %s' %t


David





More information about the Python-list mailing list