[Tutor] Can't process all my files (need to close?)
aeneas24 at priest.com
aeneas24 at priest.com
Mon Sep 20 17:19:29 CEST 2010
My Python script needs to process 45,000 files, but it seems to blow up after about 10,000. Note that I'm outputting bazillions of rows to a csv, so that may be part of the issue.
Here's the error I get (I'm running it through IDLE on Windows 7):
Microsoft Visual C++ Runtime Library
Runtime Error!
Program: C:\Python26\pythonw.exe
This application has requested the Runtime to terminate it in an usual way.
I think this might be because I don't specifically close the files I'm reading. Except that I'm not quite sure where to put the close. I have 3 places where I would think it might work but I'm not sure which one works or how exactly to do the closing (what it is I append ".close()" to).
1) During the self.string here:
class ReviewFile:
# In our movie corpus, each movie is one text file. That means that each text file has some "info" about the movie (genre, director, name, etc), followed by a bunch of reviews. This class extracts the relevant information about the movie, which is then attached to review-specific information.
def __init__(self, filename):
self.filename = filename
self.string = codecs.open(filename, "r", "utf8").read()
self.info = self.get_fields(self.get_field(self.string, "info")[0])
review_strings = self.get_field(self.string, "review")
review_dicts = map(self.get_fields, review_strings)
self.reviews = map(Review, review_dicts)
2) Maybe here?
def reviewFile ( file, args):
for file in glob.iglob("*.txt"):
print " Reviewing...." + file
rf = ReviewFile(file)
3) Or maybe here?
def reviewDirectory ( args, dirname, filenames ):
print 'Directory',dirname
for fileName in filenames:
reviewFile( dirname+'/'+fileName, args )
def main(top_level_dir,csv_out_file_name):
csv_out_file = open(str(csv_out_file_name), "wb")
writer = csv.writer(csv_out_file, delimiter=',')
os.path.walk(top_level_dir, reviewDirectory, writer )
main(".","output.csv")
Thanks very much for any help!
Tyler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100920/db85ae51/attachment-0001.html>
More information about the Tutor
mailing list