Get all strings matching given RegExp
Marc Christiansen
usenet at solar-empire.de
Thu Apr 3 07:17:57 EDT 2008
Alex9968 <noname9968 at gmail.com> wrote:
> Can I get sequence of all strings that can match a given regular
> expression? For example, for expression '(a|b)|(x|y)' it would be ['ax',
> 'ay', 'bx', 'by']
>
> It would be useful for example to pass these strings to a search engine
> not supporting RegExp (therefore adding such support to it). A program
> can also let user specify sequence of strings using RegExp (filenames to
> process, etc.). If there are other types of expressions for these
> purposes, please let me know.
>
> I know that for some expressions there would be infinite amount of
> matching strings, but these aren't the cases I'm considering. It'd still
> be possible if string length is limited (there might be large but finite
> number of matching strings).
This will give you all (byte-)strings upto a given length which match a
given regular expression. But beware, it can be slow ;)
import re
all_chars = [chr(i) for i in xrange(256)]
def gen_strings(length, alphabet=all_chars):
if length == 1:
for c in alphabet:
yield c
else:
for i in alphabet:
yield c
for s in gen_strings(length - 1, alphabet):
yield c + s
def regex_matches(regex, max_length, alphabet=all_chars):
r = re.compile('(' + regex + r')\Z')
return (s for s in gen_strings(max_length, alphabet) if r.match(s))
Marc
More information about the Python-list
mailing list