Persistent variable in subprocess using multiprocessing?

mheavner miheavner at gmail.com
Thu Jul 16 15:18:43 CEST 2009


On Jul 16, 8:39 am, Piet van Oostrum <p... at cs.uu.nl> wrote:
> >>>>> mheavner <miheav... at gmail.com> (m) wrote:
> >m> I'm using multiprocessing to spawn several subprocesses, each of which
> >m> uses a very large data structure (making it impractical to pass it via
> >m> pipes / pickling). I need to allocate this structure once when the
> >m> process is created and have it remain in memory for the duration of
> >m> the process. The way the multiprocessing module is set up, only the
> >m> 'run' method runs within the subprocess - so creating a wrapper class
> >m> with a constructor that allocates the structure in __init__ will not
> >m> work, as far as I know, as this will still be within the parent
> >m> process.
> >m> If I were working in C/C++, I would declare the variable "static"
> >m> within the function body - is there any way with the multiprocessing
> >m> module to have persistent data members within subprocesses?
> >m> Any ideas??
>
> Your post is not entirely clear. Is `the process' the same as `the
> subprocess'?
>
> Assuming it is, what is the problem? You can create the datastructure
> first thing in the run method can't you?
>
> Like this:
>
> from multiprocessing import Process
> from time import sleep
> from random import random
>
> class MyProcess(Process):
>
>     def __init__(self, number):
>         self.number = number
>         Process.__init__(self)
>
>     def run(self):
>         print "Process %s started" % self.number
>         self.data = range(self.number * 100000, (self.number + 1) * 100000)
>         self.doit()
>
>     def doit(self):
>         for i in range(5):
>             sleep(3 * random())
>             self.data[i] += i
>             print self.data[i]
>
> processes = []
> for k in range(10):
>     p = MyProcess(k)
>     p.start()
>     processes.append(p)
>
> for p in processes:
>     p.join()
>
> --
> Piet van Oostrum <p... at cs.uu.nl>
> URL:http://pietvanoostrum.com[PGP 8DAE142BE17999C4]
> Private email: p... at vanoostrum.org

'The process' refers to the subprocess. I could do as you say, load
the data structure each time, but the problem is that takes a
considerable amount of time compared to the the actual computation
with the data it contains. I'm using these processes within a loop as
follows:

         # Don't recreate processes or Queues
         pop1 = Queue()
         pop2 = Queue()
         pop_out = Queue()
         p1 = CudaProcess(0, args=(costf,pop1,pop_out))
         p2 = CudaProcess(1, args=(costf,pop2,pop_out))

         # Main loop
         for i in range(maxiter):
                 print 'ITERATION: '+str(i)
                 if log != None:
                         l = open(log,'a')
                 l.write('Iteration: '+str(i)+'\n')
                 l.close()

                 # Split population in two
                 pop1.putmany(pop[0:len(pop)/2])
                 pop2.putmany(pop[len(pop)/2:len(pop)])

                 # Start two processes
                 if not p1.isAlive():
                         p1.start()
                         print 'started %s'%str(p1.getPid())
                 else:
                         p1.run()
                 if not p2.isAlive():
                         p2.start()
                         print 'started %s'%str(p2.getPid())
                 else:
                         p2.run()
                 .
                 .
                 .

So I'd like to load that data into memory once and keep there as long
as the process is alive (ideally when the subprocess is created,
storing some sort of pointer to it), rather than loading it each time
run is called for a process within the loop. Could be my CudaProcess
class - I'll check out what Diez suggested and post back.



More information about the Python-list mailing list