Multiprocessing.Queue - I want to end.
kyrie at uh.cu
Mon May 4 09:52:14 EDT 2009
On Monday 04 May 2009 04:01:23 am Hendrik van Rooyen wrote:
> This will form a virtual (or real if you have different machines)
> systolic array with producers feeding consumers that feed
> the summary process, all running concurrently.
Nah, I can't do that. The summary process is expensive, but not nearly as
expensive as the consuming (10 minutes vs. a few hours), and can't be started
anyway before the consumers are done.
> You only need to keep the output of the consumers in files if
> you need access to it later for some reason. In your case it sounds
> as if you are only interested in the output of the summary.
Or if the summarizing process requires that it is stored on files. Or if the
consumers naturally store the data on files. The consumers "produce" several
gigabytes of data, not enough to make it intractable, but enough to not want
to load them into RAM or transmit it back.
In case you are wondering what the job is: i'm indexing a lot of documents
with Xapian. The producer reads the [compressed] documents from the hard
disk, the consumers process it and index it on they own xapian database. When
they are finished, I merge the databases (the summarizing) and delete the
partial DBs. I don't need the DBs to be in memory at any time, and xapian
works with files anyway. Even if I were to use different machines (not worth
it for a process that will not run very frequently, except at developing
time), it would be still cheaper to scp the files.
Now, if I only had a third core available to consume a bit faster ...
Luis Zarrabeitia (aka Kyrie)
Fac. de Matemática y Computación, UH.
More information about the Python-list