Get all strings matching given RegExp

Marc Christiansen usenet at solar-empire.de
Thu Apr 3 07:17:57 EDT 2008


Alex9968 <noname9968 at gmail.com> wrote:
> Can I get sequence of all strings that can match a given regular 
> expression? For example, for expression '(a|b)|(x|y)' it would be ['ax', 
> 'ay', 'bx', 'by']
> 
> It would be useful for example to pass these strings to a search engine 
> not supporting RegExp (therefore adding such support to it). A program 
> can also let user specify sequence of strings using RegExp (filenames to 
> process, etc.). If there are other types of expressions for these 
> purposes, please let me know.
> 
> I know that for some expressions there would be infinite amount of 
> matching strings, but these aren't the cases I'm considering. It'd still 
> be possible if string length is limited (there might be large but finite 
> number of matching strings).

This will give you all (byte-)strings upto a given length which match a
given regular expression. But beware, it can be slow ;)

import re

all_chars = [chr(i) for i in xrange(256)]

def gen_strings(length, alphabet=all_chars):
    if length == 1:
        for c in alphabet:
            yield c
    else:
        for i in alphabet:
            yield c
            for s in gen_strings(length - 1, alphabet):
                yield c + s

def regex_matches(regex, max_length, alphabet=all_chars):
    r = re.compile('(' + regex + r')\Z')
    return (s for s in gen_strings(max_length, alphabet) if r.match(s))

Marc



More information about the Python-list mailing list