Regular expressions
David C. Fox
davidcfox at post.harvard.edu
Tue Oct 28 15:47:10 EST 2003
Dennis Reinhardt wrote:
>>That's not what I want. If I have yipee as "kdasfjh", then I want to
>>compare to all combinations, so that there will be a match when it
>
> compares
>
>>it with say when line is "fas" because line has all those letters in
>
> yipee.
>
>>But "fass" will not match because yipee does not have two "s"'s.
>
>
> Were you to alphabetize both your regex and line, then the question is can
> "afs" or "afss" match the pattern "adfhjks". By manipulating the lines
> strings "a.*f.*s" or "a.*f.*s.*s", you will see that the first line
> instance matches the regex and the second does not.
>
> Notice that the regex must be a string (not compiled) *and* this trick runs
> the regex backwards, matching the line (with embedded ".*") against the
> pattern and not the normal direction. There may be more straightforward
> ways to do this but this is the regex solution which occurs to me.
>
> On second thought, the regex "\A(a|)(d|)(f|)(h|)(j|)(k|)(s|)\Z" will match
> "fas" but not "fass".
Dennis is correct: alphabetizing both the pattern and the target strings
is way to do this. I would use a slightly different regex, though,
constructed as follows:
import re
def char_counts(s):
"""returns a dictionary indicating how many times a given character
appears in the string s
"""
d = {}
for char in s:
d[char] = d.get(char, 0) + 1 # current count, or zero, plus one
return d
def char_subset(s):
"""given a string s, returns a regular expression which matches a
sorted character string containing a subset of the same characters
(and with no more occurences of each character than in the original
string)
"""
counts = char_counts(s)
l = []
chars = counts.keys()
chars.sort()
for char in chars:
r = '%s{0,%d}' % (char, counts[char])
l.append(r)
# regex fragment matching count or fewer occurances os that many
l.append('$') # make sure that we match against the full string
return ''.join(l)
def sorted_string(s):
"""given a string s, return a new one which all the characters
sorted
"""
l = list(s)
l.sort()
return ''.join(l)
Then, you can compare a given target string, t, against the string in
yipee with:
r = char_subset(yipee)
t_sorted = sorted_string(t)
if re.match(r, t_sorted):
print 'found a match: %s' %t
David
More information about the Python-list
mailing list