[Tutor] Nested loop of I/O tasks

Wed Nov 25 08:25:26 CET 2009

Bo Li wrote:
> Dear Python
>
> I am new to Python and having questions about its usage. Currently I have to
> read two .csv files INCT and INMRI which are similar to this
>
> INCT
>       NONAME 121.57 34.71 14.81 1.35 0 0 1  Cella 129.25 100.31 27.25 1.35 1
> 1 1  Chiasm 130.3 98.49 26.05 1.35 1 1 1  FMagnum 114.89 144.94 -15.74 1.35
> 1 1 1  Iz 121.57 198.52 30.76 1.35 1 1 1  LEAM 160.53 127.6 -1.14 1.35 1 1 1
> LEAM 55.2 124.66 12.32 1.35 1 1 1  LPAF 180.67 128.26 -9.05 1.35 1 1 1  LTM
> 77.44 124.17 15.95 1.35 1 1 1  Leye 146.77 59.17 -2.63 1.35 1 0 0  Nz 121.57
> 34.71 14.81 1.35 1 1 1  Reye 91.04 57.59 6.98 1.35 0 1 0
> INMRI
>     NONAME 121.57 34.71 14.81 1.35 0 0 1  Cella 129.25 100.31 27.25 1.35 1 1
> 1  Chiasm 130.3 98.49 26.05 1.35 1 1 1  FMagnum 114.89 144.94 -15.74 1.35 1
> 1 1  Iz 121.57 198.52 30.76 1.35 1 1 1  LEAM 160.53 127.6 -1.14 1.35 1 1 1
> LEAM 55.2 124.66 12.32 1.35 1 1 1  LPAF 180.67 128.26 -9.05 1.35 1 1 1  LTM
> 77.44 124.17 15.95 1.35 1 1 1  Leye 146.77 59.17 -2.63 1.35 1 0 0
> My job is to match the name on the two files and combine the first three
> attributes together. So far I tried to read two files. But when I tried to
> match the pattern using nested loop, but Python stops me after 1 iteration.
> Here is what I got so far.
>
> INCT = open(' *.csv')
> INMRI = open(' *.csv')
>
> for row in INCT:
>     name, x, y, z, a, b, c, d = row.split(",")
>     print aaa,
>     for row2 in INMRI:
>         NAME, X, Y, Z, A, B, C, D = row2.split(",")
>         if name == NAME:
>             print aaa
>
>
> The results are shown below
>
> "NONAME" "NONAME" "Cella " "NONAME" "Chiasm" "NONAME" "FMagnum" "NONAME"
> "Inion" "NONAME" "LEAM" "NONAME" "LTM" "NONAME" "Leye" "NONAME" "Nose"
> "NONAME" "Nz" "NONAME" "REAM" "NONAME" "RTM" "NONAME" "Reye" "Cella"
> "Chiasm" "FMagnum" "Iz" "LEAM" "LEAM" "LPAF" "LTM" "Leye" "Nz" "Reye"
>
>
> I was a MATLAB user and am really confused by what happens with me. I wish
> someone could help me with this intro problem and probably indicate a
> convenient way for pattern matching. Thanks!
>
>   
I'm wondering how Christian's quote of your message was formatted so 
much better.  Your csv contents are word-wrapped when I see your email.  
Did you perhaps send it using html mail, instead of text?

The other thing I note (and this is the same with Christian's version of 
your message), is that the code you show wouldn't run, and also wouldn't 
produce the output you supplied, so you must have retyped it instead of 
copy/pasting it.  That makes the job harder, for anybody trying to help.

Christian's analysis of your problem was spot-on.  Files can only be 
iterated once, and thus the inner loop will fail the second time through 
the outer loop.  However, there are two possible fixes that are both 
closer to what you have, and therefore perhaps more desirable.

Simplest change is to do a readlines() on the second file.  This means 
you have to have enough memory for the whole file, stored as a list.

INCT = open('file1.csv')
INMRIlist = open('file2.csv').readlines()

for row in INCT:
    name, x, y, z, a, b, c, d = row.split(",")
    print name,
    for row2 in INMRIlist:
        NAME, X, Y, Z, A, B, C, D = row2.split(",")
        print NAME,
        if name == NAME:
            print "---matched---"

The other choice, somewhat slower, but saving of memory, is

INCT = open('file1.csv')
#INMRI = open('file2.csv')

for row in INCT:
    name, x, y, z, a, b, c, d = row.split(",")
    print name,
    for row2 in open('file2.csv'):
        NAME, X, Y, Z, A, B, C, D = row2.split(",")
        print NAME,
        if name == NAME:
            print "---matched---"

There are many other things I would change (probably eventually going to 
the dictionary that Christian mentioned), but these are the minimum 
changes to let you continue down the path you've envisioned.

(all code untested, I just typed it directly into the email, assuming 
Python2.6)

DaveA