[Tutor] a quick Q: how to use for loop to read a series of files with .doc end
lina
lina.lastname at gmail.com
Wed Oct 5 08:51:06 CEST 2011
On Wed, Oct 5, 2011 at 1:42 PM, Dave Angel <d at davea.name> wrote:
> On 10/04/2011 11:13 PM, lina wrote:
>
>> On Wed, Oct 5, 2011 at 10:45 AM, Dave Angel<d at davea.name> wrote:
>>
>> On 10/04/2011 10:22 PM, lina wrote:
>>>
>>> On Wed, Oct 5, 2011 at 1:30 AM, Prasad, Ramit<ramit.prasad at jpmorgan.***
>>>> *com<ramit.prasad at jpmorgan.com>
>>>>
>>>>> w
>>>>>
>>>> <SNIP>
>>>
>>>> SyntaxError: invalid syntax
>>>>
>>>> for fileName in os.listdir("."):
>>>> if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=****
>>>>
>>>> =".xpm":
>>>> filedata = open(fileName)
>>>> text=filedata.readlines()
>>>> cols = len(text[0])
>>>> except IndexError:
>>>> print ("Index Error.")
>>>> result=[]
>>>> for idx in xrange(cols):
>>>> results.append(0)
>>>> for line in text:
>>>> for col_idx, field in enumerate(line):
>>>> if token in field:
>>>> results[col_idx]+=1
>>>> for index in col_idx:
>>>> print results[index]
>>>>
>>>> it showed up:
>>>>
>>>> print results[]
>>>> ^
>>>> SyntaxError: invalid syntax
>>>>
>>>> Sorry, I am still lack deep understanding about something basic. Thanks
>>>> for
>>>> your patience.
>>>>
>>>>
>>>> Simplest answer here is you might have accidentally run this under
>>>> Python
>>>>
>>> 3.x. That would explain the syntax error on the print function. Pick a
>>> single version and stick to it. In fact, you might even put a version
>>> test
>>> at the beginning of the code to give an immediate error.
>>>
>>> choose python3.
>>
>> Then change that last print to use parentheses. print() is a function
> call in Python 3.x, while it was a statement in earlier Python versions.
>
> <SNIP>
>>
>> This example illustrates one reason why it's a mistake to write all the
>>> code at top level. This code should probably be at least 4 functions,
>>> with
>>> each one handling one abstraction.
>>>
>>> It's frustrating. Seriously. (I think I need to read some good
>> (relevant)
>> codes first.
>>
>> Is Python your first programming language? It was approximately my 30th.
>
Not exactly. Ha ... I don't know there are so many languages there.
>
> I learned "programming" from a Fortran book in 1967. I had no access to a
> computer, though there was at least one in the state, at the Yale campus. I
> saw it in a field trip by the (advanced) students that were taking
> programming. They weren't allowed to take it till finishing 2nd year
> calculus, which I didn't do till I got to college. However, when I went to
> college the following year, I ran across another student who knew how to
> access the mainframe (via punch-cards), and could tell me how to do it.
> (Security was very light). For a few months, I hacked daily, and learned a
> lot. Then the following year, I actually took an electrical engineering
> class that introduced the concepts of programming, and I spent my time doing
> experiments that barely resembled the assignments. I ended up with an
> incomplete in the course, which I made up by writing a linear circuit
> analysis program. Punched card input, graphical output to a line printer
> using rows of asterisks.
>
How to start, I learned C 10 years ago, but for whole semester, I never
wrote a serious program, but indeed attended every lecture.
At that time, I was addicted literature staff. But later realized that lots
of writers (especially the ones I like) ended up with committing suicide,
something to heavy to handle, so I changed to something like physics, I
noted lots of people doing physics living really long and happy (long living
the physicist), then four years as (applied) physics, three years as
(theoretical) physics, then (bio-) physics in the following years. (It's a
joke).
During those years used maple, matlab and some basic awk, bash. but all is
very basic. shame...did not do something seriously.
>
> Point is, it takes a lot of time, and usually a one-on-one mentor to get
> the concepts nailed down. Seldom did anyone tell me "write these lines
> down, and it'll solve the problem." instead they told me where my problem
> was, and where in those manuals (chained to tables in the lab) to find more
> information.
>
> It wasn't till my fourth language that I found out about local variables,
> and how a function should encapsulate one concept. The first three didn't
> have such things.
>
>
>
> Further, while you're developing, you should probably put the test data
>>> into a literal (probably a multiline literal using triplequotes), so you
>>> can
>>> experiment easily with changes to the data, and see how it results.
>>>
>>>
>> #!/bin/python
>>
>> import os.path
>>
>> tokens=['B','E']
>>
>> for fileName in os.listdir("."):
>> if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=**
>> =".xpm":
>> filedata = open(fileName)
>> text=filedata.readlines()
>> results={}
>> numcolumns=len(text.strip())
>> for ch in tokens:
>> results[ch]=[0]*numcolumns
>> for line in text:
>> for col, ch in enumerate(line):
>> if ch in tokens:
>> results[ch][col]+=1
>> for item in results:
>> print item
>>
>> $ python3 counter-vertically.py
>> File "counter-vertically.py", line 20
>> print item
>> ^
>> SyntaxError: invalid syntax
>>
>> As I said above, Python 3 needs parentheses around print's argument list.
>
> As for splitting into functions, consider:
>
>
> #these two are capitalized because they're intended to be constant
> TOKENS = "BE"
> LINESTOSKIP = 43
> INFILEEXT = ".xpm"
> OUTFILEEXT = ".txt"
>
> def dofiles(topdirectory):
> for filename in os.listdr(topdirectory):
> processfile(filename)
>
> def processfile(infilename):
> base, ext =os.path.splitext(fileName)
> if ext == INFILEEXT:
> text = fetchonefiledata(infilename)
> numcolumns = len(text[0])
> results = {}
> for ch in TOKENS:
>
> results[ch] = [0] * numcolumns
> for line in text:
> line = line.strip()
>
> for col, ch in enumerate(line):
> if ch in tokens:
> results[ch][col] += 1
> writeonefiledata(base+**OUTFILEEXT, results)
>
> def fetchonefiledata(inname):
> infile = open(inname)
> text = infile.readlines()
> return text[LINESTOSKIP:]
>
> def writeonefiledata(outname):
> outfile = open(outname, "w")
> ...process the results as appropriate...
> ....(since you didn't tell us how multiple tokens were to be displayed)
>
> if __name__ == "__main__":
> dofiles(".") #or get the top directory from the sys.argv variable,
> which is set from command line.
>
>
> You dissect the former one you suggested before into 4 functions.
a little question, why choose .ext? why the splitext is also ext here?
> Now this is totally untested. I just typed it without even trying any of
> it.
import os.path
TOKENS="E"
LINESTOSKIP=0
INFILEEXT=".xpm"
OUTFILEEXT=".txt"
def dofiles(topdirectory):
for filename in os.listdir(topdirectory):
processfile(filename)
def processfile(infilename):
base, ext =os.path.splitext(infilename)
if ext == INFILEEXT:
text = fetchonefiledata(infilename)
numcolumns=len(text[0])
results={}
for ch in TOKENS:
results[ch] = [0]*numcolumns
for line in text:
line = line.strip()
for col, ch in enumerate(line):
if ch in TOKENS:
results[ch][col]+=1
writeonefiledata(base+OUTFILEEXT,results)
def fetchonefiledata(inname):
infile = open(inname)
text = infile.readlines()
return text[LINESTOSKIP:]
def writeonefiledata(outname,results):
outfile = open(outname,"w")
for item in results:
return outfile.write(item)
if __name__=="__main__":
dofiles(".")
just the results is a bit unexpected.
$ more try.txt
E
I might make a mistake in the writeonefiledata your left part.
But it gives you a simple refactoring that splits the logic so each can be
> visualized (and tested) independently. i'd also split up processfile(),
> once I realized how big it was.
>
> There are many shortcuts that can be applied. Some of them probably use
> language features you're not comfortable with, like perhaps generators. And
> if efficiency is important, there are optimizations to do, like using
> islice directly on the infile object. That one would eliminate having to
> have the whole file stored in memory at one time.
>
> Likewise there are further things that could be done to decouple the
> functions even more.
>
> But there's nothing in the above code which uses very advanced topics, so
> you should be able to understand it and fix whatever typos I've undoubtedly
> got.
>
> What are you using for debugging aids? Besides this group, I mean. print
> statements? An IDE ? which one?
>
debugging aids?
I just run python3 script.py
it will pop up some hints,
in the middle, probably try print.
Thanks for your time,
> --
>
> DaveA
>
>
--
Best Regards,
lina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111005/6de6da14/attachment-0001.html>
More information about the Tutor
mailing list