fork()

Mon Jun 7 14:38:53 EDT 1999

Arne Mueller <a.mueller at icrf.icnet.uk> writes:

> All that woks fine now (after I spend a long time fiddling with the pipe
> and select stuff to synchronize I/O). However the forking is a waste of
> time and memory because all the children get their own large dictionary!
> Is there a way to use shared memory in python, I mean how can different
> processes access (for read only) one object without passing the keys of
> the dictionary object through a pipe (the children need _quick_ access
> to that dictionary)? 

If you've got a modern operating system, fork() doesn't copy the
whole address space at once, the memory is shared as long as it is
not modified.  Unfortunately, even if you only fetch an object from a
dictionary, Python is writing to previously shared memory behind the
scene (because the reference counts are updated).  This means that the
memory sharing mechanism of the operating system doesn't work.

I hope others will have better suggestions than I.  First of all, you
should be sure that you cannot solve this problem by throwing more RAM
on it.  RAM is so cheap these days, so it might be too expensive to spend
hours on optimizing your program.  (This obviously depends on the purpose
of your program.)

My second, more serious suggestion: Put the data contained in the
objects stored in the dictionary into a C struct, write a simple C
implementation of a hash table which stores those structs and a lookup
function which creates suitable Python objects on the fly.  This way,
there are no reference counts which are perpetually updated and thus
break he copy-on-write scheme, so the main data structure can be shared
efficiently among the processes.

Another solution is probably a Python version using the Boem garbage
collector which gets rid of the reference counts as well.  There are
some patches floating around, but I don't know whether they are suited
to recent Python releases or production use.