[Tutor] Merging Text Files
Ara Kooser
ghashsnaga at gmail.com
Wed Oct 13 22:16:21 CEST 2010
Hello all,
I am working on merging two text files with fields separated by commas.
The files are in this format:
File ONE:
*Species, Protein ID, E value, Length*
Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140, 5256,
Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256,
Streptomyces sp. AA4, ZP_05482482, 1.08889e-124, 5256,
Streptomyces sp. AA4, ZP_07281899, 2.9253900000000001e-140, 5260,
File TWO:
*Protein ID, Locus Tag, Start/Stop*
ZP_05482482, StAA4_010100030484, complement(NZ_ACEV01000078.1:25146..40916)
ZP_07281899, SSMG_05939, complement(NZ_GG657746.1:6565974..6581756)
I looked around for other posts about merging text files and I have this
program:
one = open("final.txt",'r')
two = open("final_gen.txt",'r')
merge = open("merged.txt",'w')
merge.write("Species, Locus_Tag, E_value, Length, Start/Stop\n")
for line in one:
print(line.rstrip() + two.readline().strip())
merge.write(str([line.rstrip() + two.readline().strip()]))
merge.write("\n")
merge.close()
inc = file("merged.txt","r")
outc = open("final_merge.txt","w")
for line in inc:
line = line.replace('[','')
line = line.replace(']','')
line = line.replace('{','')
line = line.replace('}','')
outc.write(line)
inc.close()
outc.close()
one.close()
two.close()
This does merge the files.
Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140,
5256,ZP_05482482, StAA4_010100030484,
complement(NZ_ACEV01000078.1:25146..40916)
Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138,
5256,ZP_05477599, StAA4_010100005861, NZ_ACEV01000013.1:86730..102047
But file one has multiple instances of the same Protein ID such as
ZP_05482482. So the data doesn't line up anymore. I would like the program
to search for each Protein ID number and write the entry from file 2 in each
place and then move on to the next ID number.
Example of desired output:
Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
2.8293600000000001e-140, 5256, complement(NZ_ACEV01000078.1:25146..40916)
Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
8.0333299999999997e-138, 5256, complement(NZ_ACEV01000078.1:25146..40916)
I was thinking about writing the text files into a dictionary and then
searching for each ID and then insert the content from file TWO into where
the IDs match. But I am not sure how to start. Is there a more pythony way
to go about doing this?
Thank you for your time and help.
Regards,
Ara
--
Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an sub
cardine glacialis ursae.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20101013/1a9a5ded/attachment-0001.html>
More information about the Tutor
mailing list