Re: [Python-Dev] refleaks and caches

At 05:05 PM 1/26/2008 -0800, Neal Norwitz wrote:
Expose an API to clear the cache, and clear it at shutdown? It should probably be part of interpreter shutdown anyway.

Phillip J. Eby wrote:
Expose an API to clear the cache, and clear it at shutdown? It should probably be part of interpreter shutdown anyway.
Good point. I've implemented PyType_ClearCache and exposed it via sys._cleartypecache(). The function is called during finalization, too. Can somebody please double check the change? The results are promising and I'm sure I've implemented it correctly but you never know ;) Christian

On Jan 27, 2008 3:37 PM, Christian Heimes <lists@cheimes.de> wrote:
I'm not sure we should expose an API to clear the cache, but I don't have strong opinions either way. If we keep the ability to clear the cache, should we also consider some control over the int/float freelist? These are worse than the tuple/frame free lists since int/floats are unbounded. I suspect the method free lists in Objects/methodobject.c and Objects/classobject.c don't have that many entries that could be removed. There may not be a lot we can do for the int/float caches and I'm not sure if it's worth it. We can't move objects once created, although we could scan the blocks and if there are no live objects in a block, free it. That would presumably help with this case: list(range(some_big_number)) I don't know how important these things are in practice.
Can somebody please double check the change? The results are promising and I'm sure I've implemented it correctly but you never know ;)
The biggest problem I have with the patch is the attribute name. I would prefer underscores. ie _clear_type_cache instead of _cleartypecache. Attributes in sys are currently inconsistent, but it seems that most of the newer names have underscores. (Aside: if we are going to move attrs out of sys for py3k, we should consider renaming them to be consistent too. Regardless of moving them, should we rename them.) n

Neal Norwitz wrote:
Do the int/float free lists cause any trouble or can they eat lots of memory? And what about the string intern list?
The attribute name is the least problem. It's easy to fix. Brett came up with a nice idea, too. He suggested the gc module as the place for the function. Christian

-On [20080128 03:13], Christian Heimes (lists@cheimes.de) wrote:
Do the int/float free lists cause any trouble or can they eat lots of memory?
I hope I am interpreting it correctly, but it seems http://evanjones.ca/memoryallocator/ explanation on that still applies: "The worst offenders are integers and floats. These two object types allocate their own blocks of memory of approximately 1kB, which are allocated with the system malloc(). These blocks are used as arrays of integers and float objects, which avoids waste from pymalloc rounding the object size up to the nearest multiple of 8. These objects are then linked on to a simple free list. When an object is needed, one is taken from the list or a new block is allocated. When an object is freed, it is returned to the free list. This scheme is very simple and very fast, however, it exhibits a significant problem: the memory that is allocated to integers can never be used for anything else. That means if you write a program which goes and allocates 1000000 integers, then frees them and allocates 1000000 floats, Python will hold on to enough memory for 2000000 numerical objects. The solution is to apply a similar approach as was described above. Pools could be requested from pymalloc, so they are properly aligned. When freeing an integer or a float, the object would be put on a free list for its specific pool. When the pool was no longer needed, it could be returned to pymalloc. The challenge is that these types of objects are used frequently, so care is required to ensure good performance. Dictionaries and lists use a different scheme. Python always keeps a maximum of 80 free lists and dictionaries, any extra are freed. This is not optimal because some applications would perform better with a larger list, while others need less. It is possible that self-tuning the list size could be more efficient." -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ We have met the enemy and they are ours...

Jeroen Ruigrok van der Werven wrote:
[snip] Yes, the explanation still applies. It took me a while to understand how the free lists are working. Some genius came up with the idea to (ab)use the op_type field of the PyObject struct to link the freed objects. All the time I wondered why it assigns something complete different than a type object to the type field. In patch http://bugs.python.org/issue1953 I've moved the compact code from the PyFloat/Int_Fini functions to two new functions and exposed them as a single Python function gc.compact_freelists(). It doesn't solve the problem described in the text but at least it gives a user the chance to free some memory manually. Christian

On Jan 27, 2008 6:08 PM, Christian Heimes <lists@cheimes.de> wrote:
Does it? The gc module is specific to the cyclic-gc system. I don't see that this method is. If cyclic-gc is unavailable, should this function be unavailable too? -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Phillip J. Eby wrote:
Expose an API to clear the cache, and clear it at shutdown? It should probably be part of interpreter shutdown anyway.
Good point. I've implemented PyType_ClearCache and exposed it via sys._cleartypecache(). The function is called during finalization, too. Can somebody please double check the change? The results are promising and I'm sure I've implemented it correctly but you never know ;) Christian

On Jan 27, 2008 3:37 PM, Christian Heimes <lists@cheimes.de> wrote:
I'm not sure we should expose an API to clear the cache, but I don't have strong opinions either way. If we keep the ability to clear the cache, should we also consider some control over the int/float freelist? These are worse than the tuple/frame free lists since int/floats are unbounded. I suspect the method free lists in Objects/methodobject.c and Objects/classobject.c don't have that many entries that could be removed. There may not be a lot we can do for the int/float caches and I'm not sure if it's worth it. We can't move objects once created, although we could scan the blocks and if there are no live objects in a block, free it. That would presumably help with this case: list(range(some_big_number)) I don't know how important these things are in practice.
Can somebody please double check the change? The results are promising and I'm sure I've implemented it correctly but you never know ;)
The biggest problem I have with the patch is the attribute name. I would prefer underscores. ie _clear_type_cache instead of _cleartypecache. Attributes in sys are currently inconsistent, but it seems that most of the newer names have underscores. (Aside: if we are going to move attrs out of sys for py3k, we should consider renaming them to be consistent too. Regardless of moving them, should we rename them.) n

Neal Norwitz wrote:
Do the int/float free lists cause any trouble or can they eat lots of memory? And what about the string intern list?
The attribute name is the least problem. It's easy to fix. Brett came up with a nice idea, too. He suggested the gc module as the place for the function. Christian

-On [20080128 03:13], Christian Heimes (lists@cheimes.de) wrote:
Do the int/float free lists cause any trouble or can they eat lots of memory?
I hope I am interpreting it correctly, but it seems http://evanjones.ca/memoryallocator/ explanation on that still applies: "The worst offenders are integers and floats. These two object types allocate their own blocks of memory of approximately 1kB, which are allocated with the system malloc(). These blocks are used as arrays of integers and float objects, which avoids waste from pymalloc rounding the object size up to the nearest multiple of 8. These objects are then linked on to a simple free list. When an object is needed, one is taken from the list or a new block is allocated. When an object is freed, it is returned to the free list. This scheme is very simple and very fast, however, it exhibits a significant problem: the memory that is allocated to integers can never be used for anything else. That means if you write a program which goes and allocates 1000000 integers, then frees them and allocates 1000000 floats, Python will hold on to enough memory for 2000000 numerical objects. The solution is to apply a similar approach as was described above. Pools could be requested from pymalloc, so they are properly aligned. When freeing an integer or a float, the object would be put on a free list for its specific pool. When the pool was no longer needed, it could be returned to pymalloc. The challenge is that these types of objects are used frequently, so care is required to ensure good performance. Dictionaries and lists use a different scheme. Python always keeps a maximum of 80 free lists and dictionaries, any extra are freed. This is not optimal because some applications would perform better with a larger list, while others need less. It is possible that self-tuning the list size could be more efficient." -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ We have met the enemy and they are ours...

Jeroen Ruigrok van der Werven wrote:
[snip] Yes, the explanation still applies. It took me a while to understand how the free lists are working. Some genius came up with the idea to (ab)use the op_type field of the PyObject struct to link the freed objects. All the time I wondered why it assigns something complete different than a type object to the type field. In patch http://bugs.python.org/issue1953 I've moved the compact code from the PyFloat/Int_Fini functions to two new functions and exposed them as a single Python function gc.compact_freelists(). It doesn't solve the problem described in the text but at least it gives a user the chance to free some memory manually. Christian

On Jan 27, 2008 6:08 PM, Christian Heimes <lists@cheimes.de> wrote:
Does it? The gc module is specific to the cyclic-gc system. I don't see that this method is. If cyclic-gc is unavailable, should this function be unavailable too? -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
participants (6)
-
Brett Cannon
-
Christian Heimes
-
Jeroen Ruigrok van der Werven
-
Neal Norwitz
-
Phillip J. Eby
-
Thomas Wouters