[Tutor] a quick Q: how to use for loop to read a series of files with .doc end

Wed Oct 5 08:51:06 CEST 2011

On Wed, Oct 5, 2011 at 1:42 PM, Dave Angel <d at davea.name> wrote:

> On 10/04/2011 11:13 PM, lina wrote:
>
>> On Wed, Oct 5, 2011 at 10:45 AM, Dave Angel<d at davea.name>  wrote:
>>
>>  On 10/04/2011 10:22 PM, lina wrote:
>>>
>>>  On Wed, Oct 5, 2011 at 1:30 AM, Prasad, Ramit<ramit.prasad at jpmorgan.***
>>>> *com<ramit.prasad at jpmorgan.com>
>>>>
>>>>> w
>>>>>
>>>> <SNIP>
>>>
>>>> SyntaxError: invalid syntax
>>>>
>>>> for fileName in os.listdir("."):
>>>>     if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=****
>>>>
>>>> =".xpm":
>>>>         filedata = open(fileName)
>>>>         text=filedata.readlines()
>>>>         cols = len(text[0])
>>>>         except IndexError:
>>>>             print ("Index Error.")
>>>>         result=[]
>>>>         for idx in xrange(cols):
>>>>             results.append(0)
>>>>         for line in text:
>>>>             for col_idx, field in enumerate(line):
>>>>                 if token in field:
>>>>                     results[col_idx]+=1
>>>>             for index in col_idx:
>>>>                 print results[index]
>>>>
>>>> it showed up:
>>>>
>>>>     print results[]
>>>>                 ^
>>>> SyntaxError: invalid syntax
>>>>
>>>> Sorry, I am still lack deep understanding about something basic. Thanks
>>>> for
>>>> your patience.
>>>>
>>>>
>>>>  Simplest answer here is you might have accidentally run this under
>>>> Python
>>>>
>>> 3.x.  That would explain the syntax error on the print function.   Pick a
>>> single version and stick to it.  In fact, you might even put a version
>>> test
>>> at the beginning of the code to give an immediate error.
>>>
>>>  choose python3.
>>
>>  Then change that last print to use parentheses.  print() is a function
> call in Python 3.x, while it was a statement in earlier Python versions.
>
>  <SNIP>
>>
>>  This example illustrates one reason why it's a mistake to write all the
>>> code at top level.  This code should probably be at least 4 functions,
>>> with
>>> each one handling one abstraction.
>>>
>>>  It's frustrating. Seriously. (I think I need to read some good
>> (relevant)
>> codes first.
>>
>>  Is Python your first programming language?  It was approximately my 30th.
>
Not exactly. Ha ... I don't know there are so many languages there.

>
> I learned "programming" from a Fortran book in 1967.  I had no access to a
> computer, though there was at least one in the state, at the Yale campus.  I
> saw it in a field trip by the (advanced) students that were taking
> programming.  They weren't allowed to take it till finishing 2nd year
> calculus, which I didn't do till I got to college.  However, when I went to
> college the following year, I ran across another student who knew how to
> access the mainframe (via punch-cards), and could tell me how to do it.
>  (Security was very light).  For a few months, I hacked daily, and learned a
> lot.  Then the following year, I actually took an electrical engineering
> class that introduced the concepts of programming, and I spent my time doing
> experiments that barely resembled the assignments.  I ended up with an
> incomplete in the course, which I made up by writing a linear circuit
> analysis program.  Punched card input, graphical output to a line printer
> using rows of asterisks.
>
How to start, I learned C 10 years ago, but for whole semester, I never
wrote a serious program, but indeed attended every lecture.
At that time, I was addicted literature staff. But later realized that lots
of writers (especially the ones I like)  ended up with committing suicide,
something to heavy to handle, so I changed to something like physics, I
noted lots of people doing physics living really long and happy (long living
the physicist), then four years as (applied) physics, three years as
(theoretical) physics, then (bio-) physics in the following years. (It's a
joke).
During those years used maple, matlab and some basic awk, bash. but all is
very basic. shame...did not do something seriously.

>
> Point is, it takes a lot of time, and usually a one-on-one mentor to get
> the concepts nailed down.  Seldom did anyone tell me "write these lines
> down, and it'll solve the problem."  instead they told me where my problem
> was, and where in those manuals (chained to tables in the lab) to find more
> information.
>
> It wasn't till my fourth language that I found out about local variables,
> and how a function should encapsulate one concept.  The first three didn't
> have such things.
>
>
>
>  Further, while you're developing, you should probably put the test data
>>> into a literal (probably a multiline literal using triplequotes), so you
>>> can
>>> experiment easily with changes to the data, and see how it results.
>>>
>>>
>>  #!/bin/python
>>
>> import os.path
>>
>> tokens=['B','E']
>>
>> for fileName in os.listdir("."):
>>     if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=**
>> =".xpm":
>>         filedata = open(fileName)
>>         text=filedata.readlines()
>>         results={}
>>         numcolumns=len(text.strip())
>>         for ch in tokens:
>>             results[ch]=[0]*numcolumns
>>         for line in text:
>>             for col, ch in enumerate(line):
>>                 if ch in tokens:
>>                     results[ch][col]+=1
>>         for item in results:
>>                 print item
>>
>> $ python3 counter-vertically.py
>>   File "counter-vertically.py", line 20
>>     print item
>>              ^
>> SyntaxError: invalid syntax
>>
>>  As I said above, Python 3 needs parentheses around print's argument list.
>
> As for splitting into functions, consider:
>
>

> #these two are capitalized because they're intended to be constant
> TOKENS = "BE"
> LINESTOSKIP = 43
> INFILEEXT = ".xpm"
> OUTFILEEXT = ".txt"
>
> def dofiles(topdirectory):
>    for filename in os.listdr(topdirectory):
>        processfile(filename)
>
> def processfile(infilename):
>    base, ext =os.path.splitext(fileName)
>    if ext == INFILEEXT:
>        text = fetchonefiledata(infilename)
>        numcolumns = len(text[0])
>        results = {}
>        for ch in TOKENS:
>
>            results[ch] = [0] * numcolumns
>        for line in text:
>            line = line.strip()
>
>            for col, ch in enumerate(line):
>                if ch in tokens:
>                    results[ch][col] += 1
>        writeonefiledata(base+**OUTFILEEXT, results)
>
> def fetchonefiledata(inname):
>    infile = open(inname)
>    text = infile.readlines()
>    return text[LINESTOSKIP:]
>
> def writeonefiledata(outname):
>    outfile = open(outname, "w")
>    ...process the results as appropriate...
>    ....(since you didn't tell us how multiple tokens were to be displayed)
>
> if __name__ == "__main__":
>    dofiles(".")     #or get the top directory from the sys.argv variable,
> which is set from command line.
>
>
> You dissect the former one you suggested before into 4 functions.
a little question, why choose .ext? why the splitext is also ext here?

> Now this is totally untested.  I just typed it without even trying any of
> it.

import os.path

TOKENS="E"
LINESTOSKIP=0
INFILEEXT=".xpm"
OUTFILEEXT=".txt"

def dofiles(topdirectory):
    for filename in os.listdir(topdirectory):
        processfile(filename)

def processfile(infilename):
    base, ext =os.path.splitext(infilename)
    if ext == INFILEEXT:
        text = fetchonefiledata(infilename)
        numcolumns=len(text[0])
        results={}
        for ch in TOKENS:

            results[ch] = [0]*numcolumns
        for line in text:
            line = line.strip()

            for col, ch in enumerate(line):
                if ch in TOKENS:
                    results[ch][col]+=1
        writeonefiledata(base+OUTFILEEXT,results)

def fetchonefiledata(inname):
    infile = open(inname)
    text = infile.readlines()
    return text[LINESTOSKIP:]

def writeonefiledata(outname,results):
    outfile = open(outname,"w")
    for item in results:
        return outfile.write(item)

if __name__=="__main__":
    dofiles(".")

just the results is a bit unexpected.

 $ more try.txt
E

I might make a mistake in the writeonefiledata your left part.

But it gives you a simple refactoring that splits the logic so each can be
> visualized (and tested) independently.  i'd also split up processfile(),
> once I realized how big it was.
>
> There are many shortcuts that can be applied. Some of them probably use
> language features you're not comfortable with, like perhaps generators.  And
> if  efficiency is important, there are optimizations to do, like using
> islice directly on the infile object.  That one would eliminate having to
> have the whole file stored in memory at one time.
>
> Likewise there are further things that could be done to decouple the
> functions even more.
>
> But there's nothing in the above code which uses very advanced topics, so
> you should be able to understand it and fix whatever typos I've undoubtedly
> got.
>
> What are you using for debugging aids?  Besides this group, I mean.  print
> statements?  An IDE ?  which one?
>
debugging aids?
I just run python3 script.py
it will pop up some hints,
in the middle, probably try print.

Thanks for your time,

>  --
>
> DaveA
>
>

-- 
Best Regards,

lina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111005/6de6da14/attachment-0001.html>