extracting substrings from a file
Tim Chase
python.list at tim.thechases.com
Mon Sep 11 09:48:12 EDT 2006
> I have a file with several entries in the form:
>
> AFFX-BioB-5_at E. coli /GEN=bioB /gb:J04423.1 NOTE=SIF
> corresponding to nucleotides 2032-2305 of /gb:J04423.1 DEF=E.coli
> 7,8-diamino-pelargonic acid (bioA), biotin synthetase (bioB),
> 7-keto-8-amino-pelargonic acid synthetase (bioF), bioC protein, and
> dethiobiotin synthetase (bioD), complete cds.
>
> 1415785_a_at /gb:NM_009840.1 /DB_XREF=gi:6753327 /GEN=Cct8 /FEA=FLmRNA
> /CNT=482 /TID=Mm.17989.1 /TIER=FL+Stack /STK=281 /UG=Mm.17989 /LL=12469
> /DEF=Mus musculus chaperonin subunit 8 (theta) (Cct8), mRNA.
> /PROD=chaperonin subunit 8 (theta) /FL=/gb:NM_009840.1 /gb:BC009007.1
>
> and I would like to create a file that has only the following:
>
> AFFX-BioB-5_at /GEN=bioB /gb:J04423.1
>
> 1415785_a_at /gb:NM_009840.1 /GEN=Cct8
>
> Could anyone please tell me how can I do it?
The following seems to do it for me...
outfile = file('out.txt', 'w')
for line in file('in.txt'):
if '/GEN' in line and '/gb:' in line:
newline = []
for index, item in enumerate(line.split()):
if index == 0 or item.startswith('/GEN')
or item.startswith('/gb:'):
newline.append(item)
outfile.write('\t'.join(newline))
outfile.write('\n')
outfile.close()
There are some underdefined conditions...I presume that both the
GEN and gb: have to appear in the line. If only one of them is
required, change the "and" to an "or".
-tkc
More information about the Python-list
mailing list