[Tutor] Merging Text Files

Ara Kooser ghashsnaga at gmail.com
Wed Oct 13 22:16:21 CEST 2010

Hello all,

  I am working on merging two text files with fields separated by commas.
The files are in this format:

File ONE:
*Species, Protein ID, E value, Length*
Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140, 5256,
Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256,
Streptomyces sp. AA4, ZP_05482482, 1.08889e-124, 5256,
Streptomyces sp. AA4, ZP_07281899, 2.9253900000000001e-140, 5260,

File TWO:
*Protein ID, Locus Tag, Start/Stop*
ZP_05482482, StAA4_010100030484, complement(NZ_ACEV01000078.1:25146..40916)
ZP_07281899, SSMG_05939, complement(NZ_GG657746.1:6565974..6581756)

I looked around for other posts about merging text files and I have this
one = open("final.txt",'r')
two = open("final_gen.txt",'r')

merge = open("merged.txt",'w')
merge.write("Species,  Locus_Tag,  E_value,  Length, Start/Stop\n")

for line in one:
     print(line.rstrip() + two.readline().strip())
     merge.write(str([line.rstrip() + two.readline().strip()]))

inc = file("merged.txt","r")
outc = open("final_merge.txt","w")
for line in inc:
    line = line.replace('[','')
    line = line.replace(']','')
    line = line.replace('{','')
    line = line.replace('}','')


This does merge the files.
Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140,
5256,ZP_05482482, StAA4_010100030484,
Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138,
5256,ZP_05477599, StAA4_010100005861, NZ_ACEV01000013.1:86730..102047

But file one has multiple instances of the same Protein ID such as
ZP_05482482. So the data doesn't line up anymore.  I would like the program
to search for each Protein ID number and write the entry from file 2 in each
place and then move on to the next ID number.

Example of desired output:
Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
2.8293600000000001e-140, 5256, complement(NZ_ACEV01000078.1:25146..40916)
Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
8.0333299999999997e-138, 5256, complement(NZ_ACEV01000078.1:25146..40916)

I was thinking about writing the text files into a dictionary and then
searching for each ID and then insert the content from file TWO into where
the IDs match. But I am not sure how to start. Is there a more pythony way
to go about doing this?

Thank you for your time and help.


Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an sub
cardine glacialis ursae.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20101013/1a9a5ded/attachment-0001.html>

More information about the Tutor mailing list