Nested loop not working

Alf P. Steinbach /Usenet alf.p.steinbach+usenet at gmail.com
Fri Jul 16 17:14:41 EDT 2010


* Johann Spies, on 16.07.2010 16:34:
> I am overlooking something stupid.
>
> I have two files: one with keywords and another with data (one record per line).
>
> I want to determine for each keyword which lines in the second file
> contains that keyword.
>
> The following code is not working.  It loops through the second file
> but only uses the first keyword in the first file.
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import re
>
> keywords = open("sleutelwoorde",'r')
> data = open("sarua_marine_sleutelwoorde.csv",'r')
>
> remove_quotes = re.compile('"')
>
>
> for sw in keywords:
>      for r in data:
>          swc = remove_quotes('',sw)[:-1]
>          if swc in r.lower():
>                  print swc + ' --->  ' + r
>                  print swc
>
> What am I missing?

For the inner loop, 'data' is an object that represents a file and keeps track 
of a current read position of the file. The first execution of the loop moves 
that read position all the way to the End Of the File, EOF. The second time this 
loop is attempted, which would be for the second keyword, the 'data' object's 
read position is already at end of file, and thus nothing's done.

One way to just make it work is to open and close the data file within the outer 
loop. Actually with CPython it's automatically closed, as far as I can recall, 
so you only need to reopen it, but this (if true) is less than completely 
documented. This way is inefficient for small data set, but works.

In order to get a better handle on the general problem -- not the Python 
technicalitities -- google up "KWIC", KeyWord In Context. It's a common exercise 
problem given to first or second-year students. So I think there should be an 
abundance of answers and discussion, although I haven't googled.


Cheers & hth.,

- Alf

-- 
blog at <url: http://alfps.wordpress.com>



More information about the Python-list mailing list