[Tutor] Can't process all my files (need to close?)

aeneas24 at priest.com aeneas24 at priest.com
Mon Sep 20 17:19:29 CEST 2010


My Python script needs to process 45,000 files, but it seems to blow up after about 10,000. Note that I'm outputting bazillions of rows to a csv, so that may be part of the issue.

Here's the error I get (I'm running it through IDLE on Windows 7):

Microsoft Visual C++ Runtime Library
Runtime Error!
Program: C:\Python26\pythonw.exe
This application has requested the Runtime to terminate it in an usual way. 

I think this might be because I don't specifically close the files I'm reading. Except that I'm not quite sure where to put the close. I have 3 places where I would think it might work but I'm not sure which one works or how exactly to do the closing (what it is I append ".close()" to). 

1) During the self.string here:

class ReviewFile:
# In our movie corpus, each movie is one text file. That means that each text file has some "info" about the movie (genre, director, name, etc), followed by a bunch of reviews. This class extracts the relevant information about the movie, which is then attached to review-specific information. 
    def __init__(self, filename):
        self.filename = filename
        self.string = codecs.open(filename, "r", "utf8").read()
        self.info = self.get_fields(self.get_field(self.string, "info")[0])
        review_strings = self.get_field(self.string, "review")
        review_dicts = map(self.get_fields, review_strings)
        self.reviews = map(Review, review_dicts)

2) Maybe here?
def reviewFile ( file, args):
    for file in glob.iglob("*.txt"):
      print "  Reviewing...." + file
      rf = ReviewFile(file)

3) Or maybe here?

def reviewDirectory ( args, dirname, filenames ):
   print 'Directory',dirname
   for fileName in filenames:
      reviewFile( dirname+'/'+fileName, args )      
def main(top_level_dir,csv_out_file_name):
    csv_out_file  = open(str(csv_out_file_name), "wb")
    writer = csv.writer(csv_out_file, delimiter=',')
    os.path.walk(top_level_dir, reviewDirectory, writer )
main(".","output.csv")

Thanks very much for any help!

Tyler


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100920/db85ae51/attachment-0001.html>


More information about the Tutor mailing list