[Tutor] a quick Q: how to use for loop to read a series of files with .doc end

Wed Oct 5 04:45:49 CEST 2011

On 10/04/2011 10:22 PM, lina wrote:
> On Wed, Oct 5, 2011 at 1:30 AM, Prasad, Ramit<ramit.prasad at jpmorgan.com>wrote:
>
>>> But I still don't know how to get the
>>> statistic result of each column,
>> Thanks.
>> try:
>>     cols = len( text[0] ) # Find out how many columns there are (assuming
>> each row has the same number of columns)
>> except IndexError:
>>     raise #  This will make sure you can see the error while developing;
>>
> This part:
>
> It's showed up:
>
>      except IndexError:
Best guess I can make is that the line "each row has..."  needs a # in 
front of it.  or maybe your code looks like the following, which has no 
try block at all.

The except clause has to be the first line at the same indentation as 
the try line it's protecting.

>           ^
> SyntaxError: invalid syntax
>
> for fileName in os.listdir("."):
>      if os.path.isfile(fileName) and os.path.splitext(fileName)[1]==".xpm":
>          filedata = open(fileName)
>          text=filedata.readlines()
>          cols = len(text[0])
>          except IndexError:
>              print ("Index Error.")
>          result=[]
>          for idx in xrange(cols):
>              results.append(0)
>          for line in text:
>              for col_idx, field in enumerate(line):
>                  if token in field:
>                      results[col_idx]+=1
>              for index in col_idx:
>                  print results[index]
>
> it showed up:
>
>      print results[]
>                  ^
> SyntaxError: invalid syntax
>
> Sorry, I am still lack deep understanding about something basic. Thanks for
> your patience.
>
>
Simplest answer here is you might have accidentally run this under 
Python 3.x.  That would explain the syntax error on the print 
function.   Pick a single version and stick to it.  In fact, you might 
even put a version test at the beginning of the code to give an 
immediate error.

But you do have many other problems with the code.  One is that this no 
longer does anything useful with multiple tokens.  (See my last email to 
see an approach that handles multiple tokens).  Another is that you mix 
result and results.  They're entirely distinct.  So pick one spelling 
and stick to it.  Another is that for the "for index" is indented wrong, 
and uses the wrong limit value.  As it stands, it's trying to iterate 
over an integer.  You probably want to replace the whole phrase with 
something like for item in results: print item

This example illustrates one reason why it's a mistake to write all the 
code at top level.  This code should probably be at least 4 functions, 
with each one handling one abstraction.

Further, while you're developing, you should probably put the test data 
into a literal (probably a multiline literal using triplequotes), so you 
can experiment easily with changes to the data, and see how it results.

-- 

DaveA