[Tutor] Nested loop of I/O tasks

Wed Nov 25 07:38:09 CET 2009

Bo Li wrote:
> Dear Python
>
> I am new to Python and having questions about its usage. Currently I 
> have to read two .csv files INCT and INMRI which are similar to this
>
> INCT
> NONAME 	121.57 	34.71 	14.81 	1.35 	0 	0 	1
> Cella 	129.25 	100.31 	27.25 	1.35 	1 	1 	1
> Chiasm 	130.3 	98.49 	26.05 	1.35 	1 	1 	1
> FMagnum 	114.89 	144.94 	-15.74 	1.35 	1 	1 	1
> Iz 	121.57 	198.52 	30.76 	1.35 	1 	1 	1
> LEAM 	160.53 	127.6 	-1.14 	1.35 	1 	1 	1
> LEAM 	55.2 	124.66 	12.32 	1.35 	1 	1 	1
> LPAF 	180.67 	128.26 	-9.05 	1.35 	1 	1 	1
> LTM 	77.44 	124.17 	15.95 	1.35 	1 	1 	1
> Leye 	146.77 	59.17 	-2.63 	1.35 	1 	0 	0
> Nz 	121.57 	34.71 	14.81 	1.35 	1 	1 	1
> Reye 	91.04 	57.59 	6.98 	1.35 	0 	1 	0
>
>
> INMRI
> NONAME 	121.57 	34.71 	14.81 	1.35 	0 	0 	1
> Cella 	129.25 	100.31 	27.25 	1.35 	1 	1 	1
> Chiasm 	130.3 	98.49 	26.05 	1.35 	1 	1 	1
> FMagnum 	114.89 	144.94 	-15.74 	1.35 	1 	1 	1
> Iz 	121.57 	198.52 	30.76 	1.35 	1 	1 	1
> LEAM 	160.53 	127.6 	-1.14 	1.35 	1 	1 	1
> LEAM 	55.2 	124.66 	12.32 	1.35 	1 	1 	1
> LPAF 	180.67 	128.26 	-9.05 	1.35 	1 	1 	1
> LTM 	77.44 	124.17 	15.95 	1.35 	1 	1 	1
> Leye 	146.77 	59.17 	-2.63 	1.35 	1 	0 	0
>
>
> My job is to match the name on the two files and combine the first 
> three attributes together. So far I tried to read two files. But when 
> I tried to match the pattern using nested loop, but Python stops me 
> after 1 iteration. Here is what I got so far.
>
> INCT = open(' *.csv')
> INMRI = open(' *.csv')
>
> for row in INCT:
>     name, x, y, z, a, b, c, d = row.split(",")
>     print aaa,
>     for row2 in INMRI:
>         NAME, X, Y, Z, A, B, C, D = row2.split(",")
>         if name == NAME:
>             print aaa
>
>
> The results are shown below
>
> "NONAME" "NONAME" "Cella " "NONAME" "Chiasm" "NONAME" "FMagnum" 
> "NONAME" "Inion" "NONAME" "LEAM" "NONAME" "LTM" "NONAME" "Leye" 
> "NONAME" "Nose" "NONAME" "Nz" "NONAME" "REAM" "NONAME" "RTM" "NONAME" 
> "Reye" "Cella" "Chiasm" "FMagnum" "Iz" "LEAM" "LEAM" "LPAF" "LTM" 
> "Leye" "Nz" "Reye"
>
>
> I was a MATLAB user and am really confused by what happens with me. I 
> wish someone could help me with this intro problem and probably 
> indicate a convenient way for pattern matching. Thanks!
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>   
What's happening is you are iterating over the first file and on the 
first line on that file you start iterating over the second file.  Once 
the second file has been completely looped through it is 'empty' so your 
further iterations over file 1 can't loop through file 2.

If your output is going to be sorted like that so you know NONAME will 
be on the same line in both files what you can do is

INCT = open('something.csv', 'r')
INMRI = open('something_else.csv', 'r')

rec_INCT = INCT.readline()
rec_INMRI = INMRI.readline()

while rec_INCT and rec_INMRI:
    name, x, y, z, a, b, c, d = rec_INCT.split(',')
    NAME, X, Y, Z, A, B, C, D = rec.INMRI.split(',')

    if name == NAME:
        print 'Matches'

    rec_INCT = INCT.readline()
    rec_INMRI = INMRI.readline()

INCT.close()
INMRI.close()

What will happen is that you open the files, read the first line of each 
and then start with the while loop.  It will only run the while as long 
as both the INCT and INMRI files have more lines to read, if one of them 
runs out then it will exit the loop.  It then does the splitting, checks 
to see if it matches at which point you can do your further processing 
and after that read another line of each file.

Of course if the files are not sorted then you would have to process it 
a little differently.  If the file sizes are small you can use one of 
the files to build a dictionary, key being the `name` and value being 
the rest of your data, and then iterate over the second file checking to 
see if the name is in dictionary.  It would also work for this scenario 
of perfect data as well.

Hope that helps.  

-- 
Kind Regards,
Christian Witts