Persistent variable in subprocess using multiprocessing?

Thu Jul 16 09:32:47 EDT 2009

On Jul 16, 9:18 am, mheavner <miheav... at gmail.com> wrote:
> On Jul 16, 8:39 am, Piet van Oostrum <p... at cs.uu.nl> wrote:
>
>
>
> > >>>>> mheavner <miheav... at gmail.com> (m) wrote:
> > >m> I'm using multiprocessing to spawn several subprocesses, each of which
> > >m> uses a very large data structure (making it impractical to pass it via
> > >m> pipes / pickling). I need to allocate this structure once when the
> > >m> process is created and have it remain in memory for the duration of
> > >m> the process. The way the multiprocessing module is set up, only the
> > >m> 'run' method runs within the subprocess - so creating a wrapper class
> > >m> with a constructor that allocates the structure in __init__ will not
> > >m> work, as far as I know, as this will still be within the parent
> > >m> process.
> > >m> If I were working in C/C++, I would declare the variable "static"
> > >m> within the function body - is there any way with the multiprocessing
> > >m> module to have persistent data members within subprocesses?
> > >m> Any ideas??
>
> > Your post is not entirely clear. Is `the process' the same as `the
> > subprocess'?
>
> > Assuming it is, what is the problem? You can create the datastructure
> > first thing in the run method can't you?
>
> > Like this:
>
> > from multiprocessing import Process
> > from time import sleep
> > from random import random
>
> > class MyProcess(Process):
>
> >     def __init__(self, number):
> >         self.number = number
> >         Process.__init__(self)
>
> >     def run(self):
> >         print "Process %s started" % self.number
> >         self.data = range(self.number * 100000, (self.number + 1) * 100000)
> >         self.doit()
>
> >     def doit(self):
> >         for i in range(5):
> >             sleep(3 * random())
> >             self.data[i] += i
> >             print self.data[i]
>
> > processes = []
> > for k in range(10):
> >     p = MyProcess(k)
> >     p.start()
> >     processes.append(p)
>
> > for p in processes:
> >     p.join()
>
> > --
> > Piet van Oostrum <p... at cs.uu.nl>
> > URL:http://pietvanoostrum.com[PGP8DAE142BE17999C4]
> > Private email: p... at vanoostrum.org
>
> 'The process' refers to the subprocess. I could do as you say, load
> the data structure each time, but the problem is that takes a
> considerable amount of time compared to the the actual computation
> with the data it contains. I'm using these processes within a loop as
> follows:
>
>          # Don't recreate processes or Queues
>          pop1 = Queue()
>          pop2 = Queue()
>          pop_out = Queue()
>          p1 = CudaProcess(0, args=(costf,pop1,pop_out))
>          p2 = CudaProcess(1, args=(costf,pop2,pop_out))
>
>          # Main loop
>          for i in range(maxiter):
>                  print 'ITERATION: '+str(i)
>                  if log != None:
>                          l = open(log,'a')
>                  l.write('Iteration: '+str(i)+'\n')
>                  l.close()
>
>                  # Split population in two
>                  pop1.putmany(pop[0:len(pop)/2])
>                  pop2.putmany(pop[len(pop)/2:len(pop)])
>
>                  # Start two processes
>                  if not p1.isAlive():
>                          p1.start()
>                          print 'started %s'%str(p1.getPid())
>                  else:
>                          p1.run()
>                  if not p2.isAlive():
>                          p2.start()
>                          print 'started %s'%str(p2.getPid())
>                  else:
>                          p2.run()
>                  .
>                  .
>                  .
>
> So I'd like to load that data into memory once and keep there as long
> as the process is alive (ideally when the subprocess is created,
> storing some sort of pointer to it), rather than loading it each time
> run is called for a process within the loop. Could be my CudaProcess
> class - I'll check out what Diez suggested and post back.

Essentially, I'd like to "sneak" that allocation in somewhere after
the fork is done (in start()) in the context of the subprocess,
holding a pointer to that structure, but before all of the run() calls
are done