Shared memory python between two separate shell-launched processes

Fri Feb 11 08:24:36 EST 2011

On Feb 11, 5:52 am, "Charles Fox (Sheffield)" <charles.... at gmail.com>
wrote:
> On Feb 10, 6:22 pm, Jean-Paul Calderone <calderone.jeanp... at gmail.com>
> wrote:
>
>
>
>
>
>
>
>
>
> > On Feb 10, 12:21 pm, "Charles Fox (Sheffield)" <charles.... at gmail.com>
> > wrote:
>
> > > On Feb 10, 3:43 pm, Jean-Paul Calderone <calderone.jeanp... at gmail.com>
> > > wrote:
>
> > > > On Feb 10, 9:30 am, "Charles Fox (Sheffield)" <charles.... at gmail.com>
> > > > wrote:
>
> > > > > Hi guys,
> > > > > I'm working on debugging a large python simulation which begins by
> > > > > preloading a huge cache of data.  I want to step through code on many
> > > > > runs to do the debugging.   Problem is that it takes 20 seconds to
> > > > > load the cache at each launch.  (Cache is a dict in a 200Mb cPickle
> > > > > binary file).
>
> > > > > So speed up the compile-test cycle I'm thinking about running a
> > > > > completely separate process (not a fork, but a processed launched form
> > > > > a different terminal)
>
> > > > Why _not_ fork?  Load up your data, then go into a loop forking and
> > > > loading/
> > > > running the rest of your code in the child.  This should be really
> > > > easy to
> > > > implement compared to doing something with shared memory, and solves
> > > > the
> > > > problem you're trying to solve of long startup time just as well.  It
> > > > also
> > > > protects you from possible bugs where the data gets corrupted by the
> > > > code
> > > > that operates on it, since there's only one copy shared amongst all
> > > > your
> > > > tests.  Is there some other benefit that the shared memory approach
> > > > gives
> > > > you?
>
> > > > Of course, adding unit tests that exercise your code on a smaller data
> > > > set
> > > > might also be a way to speed up development.
>
> > > > Jean-Paul
>
> > > Thanks Jean-Paul, I'll have a think about this.  I'm not sure if it
> > > will get me exactly what I want though, as I would need to keep
> > > unloading my development module and reloading it, all within the
> > > forked process, and I don't see how my debugger (and emacs pdb
> > > tracking) will keep up with that to let me step though the code.
>
> > Not really.  Don't load your code at all in the parent.  Then there's
> > nothing to unload in each child process, just some code to load for
> > the very first time ever (as far as that process is concerned).
>
> > Jean-Paul
>
> Jean, sorry I'm still not sure what you mean, could you give a couple
> of lines of pseudocode to illustrate it?   And explain how my emacs
> pdbtrack would still be able to pick it up?
> thanks,
> charles

    import os
    import loader
    data = loader.preload()

    while True:
        pid = os.fork()
        if pid == 0:
            import program
            program.main(data)
        else:
            os.waitpid(pid, 0)

But I won't actually try to predict how this is going to interact with
emacs pdbtrack.

Jean-Paul