Getting references to objects without incrementing reference counters
calderone.jeanpaul at gmail.com
Sun Nov 14 22:04:17 CET 2010
On Nov 14, 11:08 am, Artur Siekielski <artur.siekiel... at gmail.com>
> I'm using CPython 2.7 and Linux. In order to make parallel
> computations on a large list of objects I want to use multiple
> processes (by using multiprocessing module). In the first step I fill
> the list with objects and then I fork() my worker processes that do
> the job.
> This should work optimally in the aspect of memory usage because Linux
> implements copy-on-write in forked processes. So I should have only
> one physical list of objects (the worker processes don't change the
> objects on the list). The problem is that after a short time children
> processes are using more and more memory (they don't create new
> objects - they only read objects from the list and write computation
> result to the database).
> After investigation I concluded the source of this must be
> incrementing of a reference counter when getting an object from the
> list. It changes only one int but OS must copy the whole memory page
> to the child process. I reimplemented the function for getting the
> element (from the file listobject.c) but omitting the PY_INCREF call
> and it solved my problems with increasing memory.
> The questions is: are there any better ways to have a real read-only
> list (in terms of memory representation of objects)? My solution is of
> course not safe. I thought about weakrefs but it seems they cannot be
> used here because getting a real reference from a weakref increases a
> reference counter. Maybe another option would be to store reference
> counters not in objects, but in a separate array to minimize number of
> memory pages they occupy...
It might be interesting to try with Jython or PyPy. Neither of these
Python runtimes uses reference counting at all.
More information about the Python-list