Looking for a regexp generator based on a set of known string representative of a string set

James Stroud jstroud at mbi.ucla.edu
Sat Sep 9 04:55:07 CEST 2006

vbfoobar at gmail.com wrote:
> Hello
> I am looking for python code that takes as input a list of strings
> (most similar,
> but not necessarily, and rather short: say not longer than 50 chars)
> and that computes and outputs the python regular expression that
> matches
> these string values (not necessarily strictly, perhaps the code is able
> to determine
> patterns, i.e. families of strings...).
> Thanks for any idea

I'm not sure your application, but Genomicists and Proteomicists have 
found that Hidden Markov Models can be very powerful for developing 
pattern models. Perhaps have a look at "Biological Sequence Analysis" by 
Durbin et al.

Also, a very cool regex based algorithm was developed at IBM:


But I think HMMs are the way to go. Check out HMMER at WUSTL by Sean 
Eddy and colleagues:




James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095


More information about the Python-list mailing list