[Python-Dev] Re: Python-Dev Digest, Vol 15, Issue 46

Evan Jones ejones at uwaterloo.ca
Tue Oct 19 15:18:56 CEST 2004


On Oct 19, 2004, at 8:32, Luis P Caamano wrote:
> This is such a big problem for us that we had to rewrite some of our 
> daemons
> to fork request handlers so that the memory would be freed.  That's 
> the only
> way we've found to deal with it, and it seems, that's the preferred 
> python
> way of doing things, using processes, IPC, fork, etc. instead of 
> threads.

Phew! I'm glad I'm not the only one who is frustrated by this. I am 
willing to put the time in to fix this issue, as long as the Python 
community doesn't think this is a dumb idea.

> In order to be able to release memory, the interpreter has to allocate
> memory in chunks bigger than the minimum that can be returned to the
> OS, e.g., in Linux that'd be 256bytes (iirc), so that libc's malloc 
> would
> use mmap to allocate that chunk.  Otherwise, if the memory was
> obtained with brk, then in most virtually all OSes and malloc 
> implementations,
> it won't be returned to the OS even if the interpreter frees the 
> memory.

Absolutely correct. Luckily, this is not a problem for Python, as its 
current behaviour is this:

a) If the allocation is > 256 bytes, call the system malloc.
b) If the allocation is < 256, use its own malloc implementation, which 
allocates memory in 256 kB chunks and never releases it.

I want to change the behaviour of (b). If I call free() on these 256 kB 
chunks, they *do* in fact get released to the operating system.

> That run will create a lot of little integer objects and the virtual 
> memory
> size of the interpreter will quickly grow to 155MB and then drop to 
> 117MB.
> The 117MB left are all those little integer objects that are not in 
> use any
> more that the interpreter would reuse as needed.

Oops, you are right: I was incorrect in my previous post. Python does 
not limit the number of free integers, etc. that it keeps around.

This is part of the problem: Each of Python's own native types 
maintains its own freelist. If you go create a huge list of ints, 
deallocate it, then create a huge list of floats, it will allocate more 
free memory, even though it doesn't need to. If the pymalloc allocator 
was used instead, it would recycle the memory that was used by the 
integers. At the moment, even if I change the behaviour of pymalloc, it 
still won't solve this problem, as each type keeps its own freelist. 
This is also on my TODO list: benchmark the performance of using the 
pymalloc allocator for these elements.

> In our application, paging to swap is extremely bad because sometimes
> we're running the OS booted from the net without swap.  The daemon has 
> to
> loop over list of 20 to 40 thousand items at a time and it quickly 
> grows to
> 60mb on the first run and then continues to grow from there.  When 
> something
> else needs memory, it tries to swap and then crashes.

This is another good point: Some systems do not have swap. The changes 
I am proposing would not make Python less memory hungry, but they would 
mean that

> In the example above, the difference between 155MB and 117MB is 37MB, 
> which I
> assume is the size of the list object returned by 'range()' which 
> contains the
> references to the integers.  The list goes away when the interpreter 
> finishes
> running the loop and because it was already known how big it was going 
> to be,
> it was allocated as a big chunk using mmap (my speculation).  As a 
> result, that
> memory was  given back to the OS and the virtual memory size of the 
> interpreter
> went down  from 155MB to 117MB.

That is correct. If you look at the implementation for lists, it keeps 
a maximum of 80 free lists around, and immediately frees the memory for 
the containing array. Again, this seems like it is sub-optimal to me: 
In some cases, if a program uses a lot of lists, 80 lists may not be 
enough. For others, 80 may be too much. It seems to me that a more 
dynamic allocation policy could be more efficient.

Evan Jones

--
Evan Jones: http://evanjones.ca/
"Computers are useless. They can only give answers" - Pablo Picasso



More information about the Python-Dev mailing list