There are a lot of implicit caches in the stdlib. Some of them can grow unlimitedly. It would be good to have an official way to clear caches. I proposed to add two functions (in some existing lightweight module or add a new module): clear_caches(level=0) register_cache(level, clear_func) clear_caches() calls cache clearing functions for specified level and larger. register_cache() registers cache clearing functions for specified level. All modules that use implicit cache should register clearing function. functools.lru_cache() should register wrapper's cache clearing function. May be register gc.collect too. Third-party libraries and user code will be able to register their own clearing functions.
On 1 April 2015 at 13:28, Serhiy Storchaka
I proposed to add two functions (in some existing lightweight module or add a new module):
clear_caches(level=0) register_cache(level, clear_func)
clear_caches() calls cache clearing functions for specified level and larger.
I'm not sure I understand how "level" would be used. Presumably anything registering a cache has to decide what level it wants to be at, but how should it decide? Maybe there should be some standard levels defined? Is there actually a need for a level parameter at all? Paul
On 01.04.15 16:27, Paul Moore wrote:
On 1 April 2015 at 13:28, Serhiy Storchaka
wrote: I proposed to add two functions (in some existing lightweight module or add a new module):
clear_caches(level=0) register_cache(level, clear_func)
clear_caches() calls cache clearing functions for specified level and larger.
I'm not sure I understand how "level" would be used. Presumably anything registering a cache has to decide what level it wants to be at, but how should it decide? Maybe there should be some standard levels defined? Is there actually a need for a level parameter at all?
All this can be discussed here. It is just an idea, I'm not sure about any details. Right now the level parameter is not needed to me, but may be other developers will need it. May be the behavior should be opposite: clear caches of specified level and *lower*. Then by default only level 0 will be cleared and users will be able to use higher levels for long-living caches.
On Apr 1, 2015, at 06:41, Serhiy Storchaka
On 01.04.15 16:27, Paul Moore wrote:
On 1 April 2015 at 13:28, Serhiy Storchaka
wrote: I proposed to add two functions (in some existing lightweight module or add a new module): clear_caches(level=0) register_cache(level, clear_func)
clear_caches() calls cache clearing functions for specified level and larger.
I can see this being useful as just a way of standardizing an API for third-party modules that have a cache-clearing function. If there are a lot of them, having most of them do it the same way would make them easier to discover.
I'm not sure I understand how "level" would be used. Presumably anything registering a cache has to decide what level it wants to be at, but how should it decide? Maybe there should be some standard levels defined? Is there actually a need for a level parameter at all?
All this can be discussed here. It is just an idea, I'm not sure about any details. Right now the level parameter is not needed to me, but may be other developers will need it.
May be the behavior should be opposite: clear caches of specified level and *lower*. Then by default only level 0 will be cleared and users will be able to use higher levels for long-living caches.
Higher vs. lower doesn't really matter that much; you can just define short, medium, and long as 0, -1, -2 instead of 0, 1, 2, right? (I think that, even if you don't define standard levels by name as Paul Moore suggests, you're still going to be doing so implicitly by the levels you choose in the stdlib...) Anyway, why do you want this? Is there some cache that's using too much memory in an app of yours? Or is it more about getting a semi-clean start on the interactive interpreter? Or something different? And which of the implicit caches all over the stdlib made you want this (since the stdlib was your motivating example)? Also, this seems like something that other platforms (special-purpose math languages, interactive SQL interpreters, etc.) might have. If so, have you looked around to see how they do it?
Why not just take a dbapi approach to this; define an api, build things to that api in the standard library, but leave it to third parties to use it. "Does your module have a cache? define these methods somewhere, and anything that manages cache will know how to deal with it". No global registry, not global manager in the standard library, just an api. On 4/1/2015 10:37, Andrew Barnert wrote:
On Apr 1, 2015, at 06:41, Serhiy Storchaka
wrote: On 01.04.15 16:27, Paul Moore wrote:
On 1 April 2015 at 13:28, Serhiy Storchaka
wrote: I proposed to add two functions (in some existing lightweight module or add a new module): clear_caches(level=0) register_cache(level, clear_func)
clear_caches() calls cache clearing functions for specified level and larger. I can see this being useful as just a way of standardizing an API for third-party modules that have a cache-clearing function. If there are a lot of them, having most of them do it the same way would make them easier to discover.
I'm not sure I understand how "level" would be used. Presumably anything registering a cache has to decide what level it wants to be at, but how should it decide? Maybe there should be some standard levels defined? Is there actually a need for a level parameter at all? All this can be discussed here. It is just an idea, I'm not sure about any details. Right now the level parameter is not needed to me, but may be other developers will need it.
May be the behavior should be opposite: clear caches of specified level and *lower*. Then by default only level 0 will be cleared and users will be able to use higher levels for long-living caches. Higher vs. lower doesn't really matter that much; you can just define short, medium, and long as 0, -1, -2 instead of 0, 1, 2, right? (I think that, even if you don't define standard levels by name as Paul Moore suggests, you're still going to be doing so implicitly by the levels you choose in the stdlib...)
Anyway, why do you want this? Is there some cache that's using too much memory in an app of yours? Or is it more about getting a semi-clean start on the interactive interpreter? Or something different?
And which of the implicit caches all over the stdlib made you want this (since the stdlib was your motivating example)?
Also, this seems like something that other platforms (special-purpose math languages, interactive SQL interpreters, etc.) might have. If so, have you looked around to see how they do it?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
The elephant in the room is: what are those caches?
I'd suggest starting without the level.
On Apr 1, 2015 7:59 AM, "Alexander Walters"
Why not just take a dbapi approach to this; define an api, build things to that api in the standard library, but leave it to third parties to use it. "Does your module have a cache? define these methods somewhere, and anything that manages cache will know how to deal with it". No global registry, not global manager in the standard library, just an api.
On 4/1/2015 10:37, Andrew Barnert wrote:
On Apr 1, 2015, at 06:41, Serhiy Storchaka
wrote: On 01.04.15 16:27, Paul Moore wrote:
On 1 April 2015 at 13:28, Serhiy Storchaka
wrote: I proposed to add two functions (in some existing lightweight module or add a new module): clear_caches(level=0) register_cache(level, clear_func)
clear_caches() calls cache clearing functions for specified level and larger.
I can see this being useful as just a way of standardizing an API for
third-party modules that have a cache-clearing function. If there are a lot of them, having most of them do it the same way would make them easier to discover.
I'm not sure I understand how "level" would be used. Presumably
anything registering a cache has to decide what level it wants to be at, but how should it decide? Maybe there should be some standard levels defined? Is there actually a need for a level parameter at all?
All this can be discussed here. It is just an idea, I'm not sure about any details. Right now the level parameter is not needed to me, but may be other developers will need it.
May be the behavior should be opposite: clear caches of specified level and *lower*. Then by default only level 0 will be cleared and users will be able to use higher levels for long-living caches.
Higher vs. lower doesn't really matter that much; you can just define short, medium, and long as 0, -1, -2 instead of 0, 1, 2, right? (I think that, even if you don't define standard levels by name as Paul Moore suggests, you're still going to be doing so implicitly by the levels you choose in the stdlib...)
Anyway, why do you want this? Is there some cache that's using too much memory in an app of yours? Or is it more about getting a semi-clean start on the interactive interpreter? Or something different?
And which of the implicit caches all over the stdlib made you want this (since the stdlib was your motivating example)?
Also, this seems like something that other platforms (special-purpose math languages, interactive SQL interpreters, etc.) might have. If so, have you looked around to see how they do it?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Apr 1, 2015 at 8:58 AM, Alexander Walters
Why not just take a dbapi approach to this; define an api, build things to that api in the standard library, but leave it to third parties to use it. "Does your module have a cache? define these methods somewhere, and anything that manages cache will know how to deal with it". No global registry, not global manager in the standard library, just an api.
What Serhiy is proposing is a mechanism for clearing a bunch of caches at once. So you need a registry of caches that are included in that operation. With just a protocol, Serhiy's goal would have to be accomplished by somehow scanning through all live objects for those that implement. An explicit opt-in mechanism is both more efficient and friendlier. -eric
On Wed, Apr 1, 2015 at 8:37 AM, Andrew Barnert
Anyway, why do you want this? Is there some cache that's using too much memory in an app of yours? Or is it more about getting a semi-clean start on the interactive interpreter? Or something different?
Presumably http://bugs.python.org/issue23839. -eric
On 1 April 2015 at 16:03, Guido van Rossum
The elephant in the room is: what are those caches?
A quick scan of the regrtest code mentioned in http://bugs.python.org/issue23839 suggest that the following would be a start: copyreg dispatch tale sys.path_importer_cache zipimport._zip_directory_cache ABC registry cache(s) sys._clear_type_cache() distutils._dir_util._path_created.clear() re.purge() _strptime._regex_cache.clear() urllib.parse.clear_cache() urllib.request.urlcleanup() linecache.clearcache() mimetypes._default_mime_types() filecmp._cache.clear() struct._clearcache() ctypes._reset_cache() That's actually quite a lot of caches... Paul
The other elephant might be... If your caches can grow without bound so that programmers need to know about them (so they can clear them, in testing or other situations), then maybe they aren't well-designed. If cache growth is an issue, I'd rather see an abstract API for LRU (and other) cache types which intelligently cap cache size. I suspect that long before you hit the inevitable MemoryError, unbounded caches probably destroy program performance by causing large-scale swapping. Skip
On 01.04.15 17:37, Andrew Barnert wrote:
I can see this being useful as just a way of standardizing an API for third-party modules that have a cache-clearing function. If there are a lot of them, having most of them do it the same way would make them easier to discover.
How will you discover them? Do you want to enumerate all loaded modules and all functions in these modules to check if they look as cache clearing functions? What about caches in class scope and dynamically created caches?
May be the behavior should be opposite: clear caches of specified level and *lower*. Then by default only level 0 will be cleared and users will be able to use higher levels for long-living caches.
Higher vs. lower doesn't really matter that much; you can just define short, medium, and long as 0, -1, -2 instead of 0, 1, 2, right?
Right. But 0, 1, 2 can look less strange than 0, -1, -2. If all actually used values are negative, then we should revert the scale.
Anyway, why do you want this? Is there some cache that's using too much memory in an app of yours? Or is it more about getting a semi-clean start on the interactive interpreter? Or something different?
This idea was inspired by a series of MemoryErrors on some buildbots. Some of these errors are provoked by overfilled caches (in particular linecache). I don't know how large other caches can be, but linecache can grow over 23 MB only on the stdlib and standard tests. See also issues #23838 and #23839. http://bugs.python.org/issue23838 http://bugs.python.org/issue23839
And which of the implicit caches all over the stdlib made you want this (since the stdlib was your motivating example)?
First of all it's linecache. But some cache can grow even more. For example re cache grows unlimitedly if you generates patterns dynamically from user data and use module level functions.
Also, this seems like something that other platforms (special-purpose math languages, interactive SQL interpreters, etc.) might have. If so, have you looked around to see how they do it?
I don't see how this idea implemented in particular library can help to solve general problem without support in the stdlib.
On Apr 1, 2015, at 08:18, Eric Snow
On Wed, Apr 1, 2015 at 8:37 AM, Andrew Barnert
wrote: Anyway, why do you want this? Is there some cache that's using too much memory in an app of yours? Or is it more about getting a semi-clean start on the interactive interpreter? Or something different?
Presumably http://bugs.python.org/issue23839.
Ah, for running a large test suite (like the stdlib's). That makes sense. From the initial report ("in particular linecache") it seems at least possible that there's really just one cache that's a real issue, in which case there's a pretty obvious fix. (And making it possible to use linecache in a bounded way, instead of all-or-nothing clearing, might be useful for other purposes besides this...) But I don't know if that really is the case; did anyone check whether the 500MB were 90%+ in linecache or anything like that?
Actually several of these "caches" aren't caches at all -- deleting data
from them will break APIs. This is true at least for copyreg, the ABC
registry (though the things with cache in their names in ABC *are* proper
caches), and the mime types registry. Also urllib.request.urlcleanup()
removes the opener set by the install_opener() API.
On Wed, Apr 1, 2015 at 8:29 AM, Skip Montanaro
The other elephant might be...
If your caches can grow without bound so that programmers need to know about them (so they can clear them, in testing or other situations), then maybe they aren't well-designed. If cache growth is an issue, I'd rather see an abstract API for LRU (and other) cache types which intelligently cap cache size. I suspect that long before you hit the inevitable MemoryError, unbounded caches probably destroy program performance by causing large-scale swapping.
Skip
-- --Guido van Rossum (python.org/~guido)
On Wed, Apr 1, 2015 at 10:35 AM, Guido van Rossum
Actually several of these "caches" aren't caches at all -- deleting data from them will break APIs.
Then, like you say, they aren't caches. If we split them into two groups (fake and real), then the real caches (like linecache) would probably benefit with some actual cache-like trimming. The others should be fixed in some other way. Will these problems only be evident during testing, or are there plausible scenarios where a real application could stress the system? Or could such unbounded registries/caches be potential vectors for DOS attacks? A cache API still makes some sense to me, even if only to have available for application writers. I see that functools has an lru_cache decorator. Perhaps linecache could use it? Skip
Regardless of whether we come up with a general solution (seems of doubtful use to me), linecache really is last-century technology. :-( On Wed, Apr 1, 2015 at 8:32 AM, Andrew Barnert < abarnert@yahoo.com.dmarc.invalid> wrote:
On Apr 1, 2015, at 08:18, Eric Snow
wrote: On Wed, Apr 1, 2015 at 8:37 AM, Andrew Barnert
wrote: Anyway, why do you want this? Is there some cache that's using too much
memory in an app of yours? Or is it more about getting a semi-clean start on the interactive interpreter? Or something different?
Presumably http://bugs.python.org/issue23839.
Ah, for running a large test suite (like the stdlib's). That makes sense.
From the initial report ("in particular linecache") it seems at least possible that there's really just one cache that's a real issue, in which case there's a pretty obvious fix. (And making it possible to use linecache in a bounded way, instead of all-or-nothing clearing, might be useful for other purposes besides this...) But I don't know if that really is the case; did anyone check whether the 500MB were 90%+ in linecache or anything like that? _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 01.04.15 18:03, Guido van Rossum wrote:
The elephant in the room is: what are those caches?
More than a dozen of caches are mentioned in Lib/test/regrtest.py. Some of these caches even have not public interface. And perhaps this list is not complete.
I'd suggest starting without the level.
Well, then API will be simpler. Then next question: where place these functions? The module should be lightweight, so it can be imported in any module with a cache without weighting it.
On 2 April 2015 at 04:32, Andrew Barnert
On Apr 1, 2015, at 08:18, Eric Snow
wrote: On Wed, Apr 1, 2015 at 8:37 AM, Andrew Barnert
wrote: Anyway, why do you want this? Is there some cache that's using too much memory in an app of yours? Or is it more about getting a semi-clean start on the interactive interpreter? Or something different?
Presumably http://bugs.python.org/issue23839.
Ah, for running a large test suite (like the stdlib's). That makes sense.
From the initial report ("in particular linecache") it seems at least possible that there's really just one cache that's a real issue, in which case there's a pretty obvious fix. (And making it possible to use linecache in a bounded way, instead of all-or-nothing clearing, might be useful for other purposes besides this...) But I don't know if that really is the case; did anyone check whether the 500MB were 90%+ in linecache or anything like that?
We can probably tune the new traceback code to avoid linecache getting
populated at all on a passing testrun.
That would be better than wiping it out on every test, which if its
being hit will just trade memory for CPU - slower test runs.
-Rob
--
Robert Collins
Hi,
2015-04-01 14:28 GMT+02:00 Serhiy Storchaka
There are a lot of implicit caches in the stdlib. Some of them can grow unlimitedly. It would be good to have an official way to clear caches.
I proposed to add two functions (in some existing lightweight module or add a new module):
I proposed a similar idea 2 years ago: https://mail.python.org/pipermail/python-dev/2013-October/129218.html Victor
participants (9)
-
Alexander Walters
-
Andrew Barnert
-
Eric Snow
-
Guido van Rossum
-
Paul Moore
-
Robert Collins
-
Serhiy Storchaka
-
Skip Montanaro
-
Victor Stinner