# Regular expressions

David C. Fox davidcfox at post.harvard.edu
Tue Oct 28 21:47:10 CET 2003

```Dennis Reinhardt wrote:

>>That's not what I want.  If I have yipee as "kdasfjh", then I want to
>>compare to all combinations, so that there will be a match when it
>
> compares
>
>>it with say when line is "fas" because line has all those letters in
>
> yipee.
>
>>But "fass" will not match because yipee does not have two "s"'s.
>
>
> Were you to alphabetize both your regex and line, then the question is can
> "afs" or "afss" match the pattern "adfhjks".  By manipulating the lines
> strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
> instance matches the regex and the second does not.
>
> Notice that the regex must be a string (not compiled) *and* this trick runs
> the regex backwards, matching the line (with embedded ".*") against the
> pattern and not the normal direction.  There may be more straightforward
> ways to do this but this is the regex solution which occurs to me.
>
> On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
> "fas" but not "fass".

Dennis is correct: alphabetizing both the pattern and the target strings
is way to do this.  I would use a slightly different regex, though,
constructed as follows:

import re

def char_counts(s):
"""returns a dictionary indicating how many times a given character
appears in the string s
"""
d = {}
for char in s:
d[char] = d.get(char, 0) + 1 # current count, or zero, plus one
return d

def char_subset(s):
"""given a string s, returns a regular expression which matches a
sorted character string containing a subset of the same characters
(and with no more occurences of each character than in the original
string)
"""
counts = char_counts(s)
l = []
chars = counts.keys()
chars.sort()
for char in chars:
r = '%s{0,%d}' % (char, counts[char])
l.append(r)
# regex fragment matching count or fewer occurances os that many
l.append('\$') # make sure that we match against the full string
return ''.join(l)

def sorted_string(s):
"""given a string s, return a new one which all the characters
sorted
"""
l = list(s)
l.sort()
return ''.join(l)

Then, you can compare a given target string, t, against the string in
yipee with:

r = char_subset(yipee)
t_sorted = sorted_string(t)
if re.match(r, t_sorted):
print 'found a match: %s' %t

David

```