Looking for a regexp generator based on a set of known string representative of a string set
James Stroud
jstroud at mbi.ucla.edu
Fri Sep 8 22:55:07 EDT 2006
vbfoobar at gmail.com wrote:
> Hello
>
> I am looking for python code that takes as input a list of strings
> (most similar,
> but not necessarily, and rather short: say not longer than 50 chars)
> and that computes and outputs the python regular expression that
> matches
> these string values (not necessarily strictly, perhaps the code is able
> to determine
> patterns, i.e. families of strings...).
>
> Thanks for any idea
>
I'm not sure your application, but Genomicists and Proteomicists have
found that Hidden Markov Models can be very powerful for developing
pattern models. Perhaps have a look at "Biological Sequence Analysis" by
Durbin et al.
Also, a very cool regex based algorithm was developed at IBM:
http://cbcsrv.watson.ibm.com/Tspd.html
But I think HMMs are the way to go. Check out HMMER at WUSTL by Sean
Eddy and colleagues:
http://hmmer.janelia.org/
http://selab.janelia.org/people/eddys/
James
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
http://www.jamesstroud.com/
More information about the Python-list
mailing list