[Tutor] Merging Text Files

Robert Jackiewicz rob at jackiewicz.ca
Wed Oct 13 22:30:56 CEST 2010


On Wed, 13 Oct 2010 14:16:21 -0600, Ara Kooser wrote:

> Hello all,
> 
>   I am working on merging two text files with fields separated by
>   commas.
> The files are in this format:
> 
> File ONE:
> *Species, Protein ID, E value, Length* Streptomyces sp. AA4,
> ZP_05482482, 2.8293600000000001e-140, 5256, Streptomyces sp. AA4,
> ZP_05482482, 8.0333299999999997e-138, 5256, Streptomyces sp. AA4,
> ZP_05482482, 1.08889e-124, 5256, Streptomyces sp. AA4, ZP_07281899,
> 2.9253900000000001e-140, 5260,
> 
> File TWO:
> *Protein ID, Locus Tag, Start/Stop*
> ZP_05482482, StAA4_010100030484,
> complement(NZ_ACEV01000078.1:25146..40916) ZP_07281899, SSMG_05939,
> complement(NZ_GG657746.1:6565974..6581756)
> 
> I looked around for other posts about merging text files and I have this
> program:
> one = open("final.txt",'r')
> two = open("final_gen.txt",'r')
> 
> merge = open("merged.txt",'w')
> merge.write("Species,  Locus_Tag,  E_value,  Length, Start/Stop\n")
> 
> for line in one:
>      print(line.rstrip() + two.readline().strip())
>      merge.write(str([line.rstrip() + two.readline().strip()]))
>      merge.write("\n")
> merge.close()
> 
> inc = file("merged.txt","r")
> outc = open("final_merge.txt","w")
> for line in inc:
>     line = line.replace('[','')
>     line = line.replace(']','')
>     line = line.replace('{','')
>     line = line.replace('}','')
>     outc.write(line)
> 
> inc.close()
> outc.close()
> one.close()
> two.close()
> 
> This does merge the files.
> Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140,
> 5256,ZP_05482482, StAA4_010100030484,
> complement(NZ_ACEV01000078.1:25146..40916) Streptomyces sp. AA4,
> ZP_05482482, 8.0333299999999997e-138, 5256,ZP_05477599,
> StAA4_010100005861, NZ_ACEV01000013.1:86730..102047
> 
> But file one has multiple instances of the same Protein ID such as
> ZP_05482482. So the data doesn't line up anymore.  I would like the
> program to search for each Protein ID number and write the entry from
> file 2 in each place and then move on to the next ID number.
> 
> Example of desired output:
> Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
> 2.8293600000000001e-140, 5256,
> complement(NZ_ACEV01000078.1:25146..40916) Streptomyces sp. AA4,
> ZP_05482482, StAA4_010100030484, 8.0333299999999997e-138, 5256,
> complement(NZ_ACEV01000078.1:25146..40916)
> 
> I was thinking about writing the text files into a dictionary and then
> searching for each ID and then insert the content from file TWO into
> where the IDs match. But I am not sure how to start. Is there a more
> pythony way to go about doing this?
> 
> Thank you for your time and help.
> 
> Regards,
> Ara

Why don't you try using the csv library which is part of the standard 
python library to parse you files.  It allows simple and efficient 
manipulation of comma separated value files.

-Rob Jackiewicz



More information about the Tutor mailing list