Regular expression

bearophileHUGS at bearophileHUGS at
Wed Jul 16 18:10:44 CEST 2008

On Jul 16, 4:14 pm, Fredrik Lundh <fred... at> wrote:
> Beema shafreen wrote:
> > How do I write a regular expression for this kind of sequences
> >  >gi|158028609|gb|ABW08583.1| CG8385-PF, isoform F [Drosophila melanogaster]
> line.split("|") ?
> it's a bit hard to come up with a working RE with only a single sample;
> what are the constraints for the different fields?  is the last part
> free form text or something else, etc.
> have you googled for existing implementations of the format you're using?

That'a a fasta file, so for the header line this is enough:
[part.strip() for part in line.split("|")]
But better is to use the biopython libs that already perform all such
things better.


More information about the Python-list mailing list