[PYTHONMAC-SIG] memory usage

Jack Jansen Jack.Jansen@cwi.nl
Mon, 05 Jan 1998 16:39:00 +0100


> Hi everybody,
> 
> I'm just trying to figure out how Python uses memory when loading
> dictionaries.
> [...]

> The problem, however, is with the scalability of the solution.
> The 3 files I upload at startup are:
> -10000 recs (350K)
> -800 recs (80K)
> -35000 recs (3500K)
>
> This require a huge 23M of RAM occupied after the upload!

The memory allocator in MacPython is of the fast-but-wasteful variety. Blocks 
of over 16Kb are allocated directly through NewPtr/DisposePtr. For smaller 
blocks we always allocate the next biggest power of 2 size. For each power of 
2 we keep a pool of 16Kb blocks that are used for that size allocations only. 
These pool blocks are never DisposePtr-ed again and never reused for other 
sized blocks. This is a design decision that may need to be reconsidered in 
the light of the huge allocations you mention, but checking all blocks in the 
pool for being completely empty is rather a lot of work.

Let's have a look at your figures.
> -10000 recs (350K)

So that's 35 byte objects. Add 24 bytes for Python object overhead and 8 for 
malloc overhead and we're at 64 byte mallocs for a total of 640Kb for the 
objects. For a list we would add another ~64K for the list object. For a 
dictionary object, assuming you invent keys on the fly and they're not 
included in the 350K you'll have another 160Kb key objects and the dict object 
would be about 100K. All in all this'll fit easily in 1Mb.

> -800 recs (80K)

Nothing to worry about, 200Kb at very most.

> -35000 recs (3500K)

Hmm, let's assume worst case: 256 bytes per object giving 9Mb for the objects, 
500Kb for the keys and 350Kb for the dict object. 10Mb total.

All in all we end up with less that 12Mb, about half of your 23Mb. However, 
I'm assuming that your 100-byte objects of the last dict are 100-byte strings. 
If they're, say, 25 4-byte integers in a tuple then the per-object malloced 
size goes to 25*32bytes (for the ints) + 256 bytes (for the tuple) giving 
about 1Kb for each object, giving 35Mb for all the objects....

Solutions? Hmm. See if you can use a gdbm database for the 35000 record 
dictionary is the first one that comes to mind. Check your datastructure (for 
the 9Mb/35Mb problem sketched above) is the second. If you're storing large 
amounts of integers the "array" module may help, it stores c-style arrays (so 
an integer takes only 4 bytes). The datastructures of NumPy also store large 
quantities of numbers in space comparable to what C would use.

And I'll put checking out whether I can make malloc() return freespace to the 
real freelist on my list of things to do at some point. However, you said 
nothing about the objects here being freed again, so I doubt whether this 
would help you anything.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@cwi.nl      | ++++ if you agree copy these lines to your sig ++++
http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 



_______________
PYTHONMAC-SIG  - SIG on Python for the Apple Macintosh

send messages to: pythonmac-sig@python.org
administrivia to: pythonmac-sig-request@python.org
_______________