
On Fr, 2016-05-27 at 22:51 +0200, Lion Krischer wrote:
Hi all,
I was told to take this to the mailing list. Relevant pull request: https://github.com/numpy/numpy/pull/7686
NumPy's FFT implementation caches some form of execution plan for each encountered input data length. This is currently implemented as a simple dictionary which can grow without bounds. Calculating lots of different FFTs thus cause a memory leak from the users' perspective. We encountered a real world situation where this is an issue.
The PR linked above proposes to replace the simple dictionary with an LRU (least recently used) cache. It will remove the least recently used pieces of data if it grows beyond a specified size (currently an arbitrary limit of 100 MB per cache). Thus almost all users will still benefit from the caches but their total memory size is now limited.
Things to consider:
* This adds quite some additional complexity but I could not find a simple way to achieve the same result. * What is a good limit on cache size? I used 100 MB because it works for my uses cases.
I am +1 in principle, since I don't like that the cache might just grow forever and I see that as a bug right now personally. One that rarely hits maybe, but a bug. I guess if you have the time, the perfect thing would be if you could time how big the cache difference is anyway, etc.? The cache mostly gets a working array I think, so does it even help for large arrays or is the time spend for the allocation negligible anway then? We also have a small array cache in numpy anyway nowadays (not sure how small small is here). Maybe this already achieves everything that the fftcache was designed for and we could even just disable it as default? The complexity addition is a bit annoying I must admit, on python 3 functools.lru_cache could be another option, but only there. - Sebastian
Cheers!
Lion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion