Code For Five Threads To Process Multiple Files?
tdahsu at gmail.com
tdahsu at gmail.com
Thu May 22 14:03:48 EDT 2008
On May 21, 11:41 am, tda... at gmail.com wrote:
> On May 21, 11:13 am, "A.T.Hofkamp" <h... at se-162.se.wtb.tue.nl> wrote:
>
>
>
> > On 2008-05-21, tda... at gmail.com <tda... at gmail.com> wrote:
>
> > > I'd appreciate any help. I've got a list of files in a directory, and
> > > I'd like to iterate through that list and process each one. Rather
> > > than do that serially, I was thinking I should start five threads and
> > > process five files at a time.
>
> > > Is this a good idea? I picked the number five at random... I was
>
> > Depends what you are doing.
> > If you are mainly reading/writing files, there is not much to gain, since 1
> > process will already push the disk IO system to its limit. If you do a lot of
> > processing, then more threads than the number of processors is not much use. If
> > you have more 'burtsy' behavior (first do lot of reading, then lot of
> > processing, then again reading, etc), then the system may be able to do some
> > scheduling and keep both the processors and the file system busy.
>
> > I cannot really give you advice on threading, I have never done that. You may
> > want to consider an alternative, namely multi-tasking at OS level. If you can
> > easily split the files over a number of OS processes (written in Python), you
> > can make the Python program really simple, and let the OS handle the
> > task-switching between the programs.
>
> > Sincerely,
> > Albert
>
> Albert,
>
> Thanks for your response - I appreciate your time!
>
> I am mainly reading and writing files, so it seems like it might not
> be a good idea. What if I read the whole file into memory first, and
> operate on it there? They are not large files...
>
> Either way, I'd hope that someone might respond with an example, as
> then I could test and see which is faster!
>
> Thanks again.
Ah, well, I didn't get any other responses, but here's what I've done:
loopCount = 0
for l in range(len(self.filesToProcess)):
threads = []
try:
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+l])))
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+2])))
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+3])))
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+4])))
threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount
+5])))
msg = "Processing file...\n"
for thread in threads:
wx.CallAfter(self.textctrl03.write(msg),
thread.start())
for thread in threads:
thread.join()
loopCount += 5
except IndexError:
pass
It works, and it works well. It starts five threads, and processes
five files at a time. (In the "self.processFiles" I read the whole
file into memory using readlines(), which works well.)
Of course, now the wx.CallAfter function doesn't work... I get
"TypeError: 'NoneType' object is not callable" for every time it is
run...
More information about the Python-list
mailing list