Threads not Improving Performance in Program

Ryan Rosario uclamathguy at
Thu Mar 19 17:50:51 CET 2009

I have a parser that needs to process 7 million files. After running
for 2 days, it had only processed 1.5 million. I want this script to
parse several files at once by using multiple threads: one for each
file currently being analyzed.

My code iterates through all of the directories within a directory,
and at each directory, iterates through each file in that directory. I
structured my code something like this. I think I might be
misunderstanding how to use threads:

mythreads = []
for directory in dirList:
 #some processing...
 for file in fileList:
    p = Process(currDir,directory,file)    #class that extends thread.Threading

for thread in mythreads:
 del thread

The actual class that extends threading.thread is below:

class Process(threading.Thread):
        vlock = threading.Lock()
        def __init__(self,currDir,directory,file):      #thread constructor
                self.currDir = currDir
       = directory
                self.file = file
        def run(self):
                redirect = re.compile(r'#REDIRECT',re.I)
                xmldoc = minidom.parse(os.path.join(self.currDir,self.file))
                        markup =
                        #An error occurred
                        BAD = open("bad.log","a")
                        BAD.writelines(self.file + "\n")
                        print "Error."
                #if successful, do more processing...

I did an experiment with a variety of numbers of threads and there is
no performance gain. The code is taking the same amount of time to
process 1000 files as it would if the code did not use threads. Any
ideas on what I am doing wrong?

More information about the Python-list mailing list