why python cache the string > 256?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Tue Oct 27 02:39:26 EDT 2009


En Mon, 26 Oct 2009 23:42:54 -0300, s7v7nislands <s7v7nislands at gmail.com>  
escribió:

> On Oct 27, 4:03 am, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
>> s7v7nislands schrieb:
>>
>> > test.py
>> > #!/usr/bin/python
>> > a = []
>> > for i in xrange(1000000):
>> >     a.append('a'*500)
>>
>> > $python -i test.py     #virt mem 514m in top output
>> >>> del a                   #virt mem 510m
>>
>> > why python cache these string?
>> > In source, I see when object size > SMALL_REQUEST_THRESHOLD, python
>> > would use malloc.
>> > also use free() when string refcount == 0.
>>
>> > do I miss somethong?
>>
>> http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-d...
>
> thanks. but it can't explain why cache string size bigger than
> SMALL_REQUEST_THRESHOLD.
> In source, It seems only allocate from cache pool when object's size <
> SMALL_REQUEST_THRESHOLD (256),
> when size > SMALL_REQUEST_THRESHOLD, it will direct use malloc.
> see void *PyObject_Malloc(size_t nbytes) in Objects/obmalloc.c
>
> I know the range & xrange's memory use. the int,str object has their
> memory manager above Python's object allocator.
> but why python don't free big string's memory, which size >
> SMALL_REQUEST_THRESHOLD

It does - it isn't Python who keeps those memory blocks, but the C stdlib.  
Apparently free doesn't bring the memory block back to the OS, but keeps  
itself internally as free blocks.
If you repeat the cycle again, you should see that memory usage doesn't  
grow beyond the previous value; when those big strings are malloc()ed  
again, the C stdlib fulfills the memory request using those free blocks it  
originally got from the OS.

This is what I see on Windows (using the pslist utility from sysinternals):

# memory usage right when Python starts:
D:\USERDATA\Gabriel>pslist -m python

PsList 1.23 - Process Information Lister
Copyright (C) 1999-2002 Mark Russinovich
Sysinternals - www.sysinternals.com

Process memory detail for LEPTON:

Name          Pid      VM      WS   WS Pk    Priv   Faults NonP Page  
PageFile
python       3672   32916    4732    4732    2828     1212    2   55      
2828

# after creating the big list of strings
python       3672  622892  533820  533820  531724   141737    2   55    
531724

# after `del a`
python       3672  618712    6656  533828    4212   141747    2   55      
4212

# after re-creating the list
python       3672  622960  533844  533844  531732   281037    2   55    
531732

# after `del a` again, repeating a few times
python       3672  618712    6708  533844    4244   281037    2   55      
4244
python       3672  618712    6672  533848    4220   420317    2   55      
4220
python       3672  618712    6696  533848    4252   559608    2   55      
4252
python       3672  618712    6680  533848    4228   698891    2   55      
4228

Note the 'VM' and 'Priv' columns (virtual address space and private bytes,  
respectively).

-- 
Gabriel Genellina




More information about the Python-list mailing list