[Tutor] Merging Text Files

Wed Oct 13 22:16:21 CEST 2010

Hello all,

  I am working on merging two text files with fields separated by commas.
The files are in this format:

File ONE:
*Species, Protein ID, E value, Length*
Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140, 5256,
Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256,
Streptomyces sp. AA4, ZP_05482482, 1.08889e-124, 5256,
Streptomyces sp. AA4, ZP_07281899, 2.9253900000000001e-140, 5260,

File TWO:
*Protein ID, Locus Tag, Start/Stop*
ZP_05482482, StAA4_010100030484, complement(NZ_ACEV01000078.1:25146..40916)
ZP_07281899, SSMG_05939, complement(NZ_GG657746.1:6565974..6581756)

I looked around for other posts about merging text files and I have this
program:
one = open("final.txt",'r')
two = open("final_gen.txt",'r')

merge = open("merged.txt",'w')
merge.write("Species,  Locus_Tag,  E_value,  Length, Start/Stop\n")

for line in one:
     print(line.rstrip() + two.readline().strip())
     merge.write(str([line.rstrip() + two.readline().strip()]))
     merge.write("\n")
merge.close()

inc = file("merged.txt","r")
outc = open("final_merge.txt","w")
for line in inc:
    line = line.replace('[','')
    line = line.replace(']','')
    line = line.replace('{','')
    line = line.replace('}','')
    outc.write(line)

inc.close()
outc.close()
one.close()
two.close()

This does merge the files.
Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140,
5256,ZP_05482482, StAA4_010100030484,
complement(NZ_ACEV01000078.1:25146..40916)
Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138,
5256,ZP_05477599, StAA4_010100005861, NZ_ACEV01000013.1:86730..102047

But file one has multiple instances of the same Protein ID such as
ZP_05482482. So the data doesn't line up anymore.  I would like the program
to search for each Protein ID number and write the entry from file 2 in each
place and then move on to the next ID number.

Example of desired output:
Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
2.8293600000000001e-140, 5256, complement(NZ_ACEV01000078.1:25146..40916)
Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
8.0333299999999997e-138, 5256, complement(NZ_ACEV01000078.1:25146..40916)

I was thinking about writing the text files into a dictionary and then
searching for each ID and then insert the content from file TWO into where
the IDs match. But I am not sure how to start. Is there a more pythony way
to go about doing this?

Thank you for your time and help.

Regards,
Ara

-- 
Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an sub
cardine glacialis ursae.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20101013/1a9a5ded/attachment-0001.html>