Regular expression
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Wed Jul 16 12:10:44 EDT 2008
On Jul 16, 4:14 pm, Fredrik Lundh <fred... at pythonware.com> wrote:
> Beema shafreen wrote:
> > How do I write a regular expression for this kind of sequences
>
> > >gi|158028609|gb|ABW08583.1| CG8385-PF, isoform F [Drosophila melanogaster]
> > MGNVFANLFKGLFGKKEMRILMVGLDAAGKTTILYKLKLGEIVTTIPTIGFNVETVE
>
> line.split("|") ?
>
> it's a bit hard to come up with a working RE with only a single sample;
> what are the constraints for the different fields? is the last part
> free form text or something else, etc.
>
> have you googled for existing implementations of the format you're using?
That'a a fasta file, so for the header line this is enough:
[part.strip() for part in line.split("|")]
But better is to use the biopython libs that already perform all such
things better.
Bye,
bearophile
More information about the Python-list
mailing list