[Tutor] Merging Text Files

Adam Lucas ademlookes at gmail.com
Thu Oct 14 20:40:31 CEST 2010


Either way; nest the for loops and index with protein IDs or dictionary one
file and write the other with matches to the dictionary:

non-python pseudocode:

for every line in TWO:
     get the first protein ID
     for every line in ONE:
        if the second protein ID is the same as the first:
             perform the string merging and write it to the file
        else:
             pass to the next protein ID in ONE

--OR--

for every line in ONE:
    make a dictionary with a key = to the protein ID and the value, the rest

for every line in TWO:
    if the dictionary has the same protein ID:
        perform the string merging and write to the file

I'm inferring an 'inner join' (drop non-matches), for an 'outer/right join'
(keep everything in TWO) initialize a 'matchmade' variable in the inner loop
and if no matches are made, write the protein to the merged file with null
values.

If you plan on querying or sharing the newly organized dataset use a
database. If this file is going to into a workflow, it probably wants text.
I'd probably do both.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20101014/09bb66c0/attachment.html>


More information about the Tutor mailing list