Re: [Python-Dev] int/float freelists vs pymalloc
May I chime in? :-) Gents, the current implementation of Python's memory management is really fine and most problems it used to have in the past have been fixed in recent years (at least the ones I know of). Probably the only one left, indeed, is the potential unbounded growth of int/float objects' freelists (beyond the shared cache of small ints). Yet, this seems to be a questionable problem because any change in the freelists' management will invariably slow down their allocation. By how much, I don't know, but whether you fallback to pymalloc above a certain threshold or use something else, the change will have a generic perf hit. The explanation is simple: you can't beat the free list scheme performance when you have frequent short bursts of allocations and deallocations, which is the typical Python pattern observed on New() & DECREF() calls. BTW if you have 2 AMD combos showing speedups noone can explain in an obvious way, then it's a cache effect. Optimizing pymalloc for non-standard byte-sizes is a no go, and although I've seen suggestions for further optimizations tailored to ints and floats, I haven't seen anyone spelling out what that optimization of pymalloc would consist of. MAL's suggestion to preallocate a pymalloc pool cache for certain object sizes is something I actually implemented in the earliest versions of pymalloc, but eventually dropped in favor of the current dynamic, on-request allocation because pre-allocating pools (and initializing the free lists) costs time and in general leads to degraded performance in the average case. I can't perf this now to prove it, but that was very clear at the time when I wrote the original stuff and applied the regression test on it. So I would kindly suggest to get down to the only problem (if any) which is the uncontrolled growth of the specialized freelists, the shared int cache notwithstanding. If you can limit a degenerative growth without a noticeable generic perf sacrifice, that would sound like an improvement to me, but so far I haven't seen any convincing arguments. And, of course, if the int/float freelist scheme was a real issue we would have probably heard of it by now in a very sound way. Vladimir
Vladimir Marangozov wrote:
May I chime in? :-)
Please do!
Gents, the current implementation of Python's memory management is really fine and most problems it used to have in the past have been fixed in recent years (at least the ones I know of).
Probably the only one left, indeed, is the potential unbounded growth of int/float objects' freelists (beyond the shared cache of small ints). Yet, this seems to be a questionable problem because any change in the freelists' management will invariably slow down their allocation. By how much, I don't know, but whether you fallback to pymalloc above a certain threshold or use something else, the change will have a generic perf hit.
The performance hit is there, but my testing indicates its mostly not dramatic (& when its dramatic it only shows in microbenchmarks...). For me, "dramatic" is more than 10% gain/loss in non-trivial benchmarks.
The explanation is simple: you can't beat the free list scheme performance when you have frequent short bursts of allocations and deallocations, which is the typical Python pattern observed on New() & DECREF() calls.
I never expected that it could be beaten. I set out to find out how close simple alloc/dealloc via PyMalloc was to the freelists, and my testing suggests its pretty close, to the point that only the int & tuple freelists have IMO a clear-cut claim to retention.
BTW if you have 2 AMD combos showing speedups noone can explain in an obvious way, then it's a cache effect.
Figured as much. No-one with an intel CPU (or a different compiler) has as yet offered any equivalent test results - I would very much like to see them, but I'm not prepared ATM to buy another box to find out...
Optimizing pymalloc for non-standard byte-sizes is a no go, and although I've seen suggestions for further optimizations tailored to ints and floats, I haven't seen anyone spelling out what that optimization of pymalloc would consist of.
I figured Tim Peters would have done pretty much anything that could have been done when he polished PyMalloc up for default use in 2.3.
MAL's suggestion to preallocate a pymalloc pool cache for certain object sizes is something I actually implemented in the earliest versions of pymalloc, but eventually dropped in favor of the current dynamic, on-request allocation because pre-allocating pools (and initializing the free lists) costs time and in general leads to degraded performance in the average case. I can't perf this now to prove it, but that was very clear at the time when I wrote the original stuff and applied the regression test on it.
Good to know that alley has been checked and found a dead end.
So I would kindly suggest to get down to the only problem (if any) which is the uncontrolled growth of the specialized freelists, the shared int cache notwithstanding. If you can limit a degenerative growth without a noticeable generic perf sacrifice, that would sound like an improvement to me, but so far I haven't seen any convincing arguments.
Christian's compaction routine will get the job done, but IMO it should be (as he himself has proposed) called from gc.collect() rather than callable as a function in sys.
And, of course, if the int/float freelist scheme was a real issue we would have probably heard of it by now in a very sound way.
Not much noise has been made here, but I've come across 2 separate complaints in different mailing lists in the last month (one of the complainants had some hope that gc.collect() might help...) Is it a deafening roar? No, but I doubt the volume will do anything but increase... Regards, Andrew. -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia
On Fri, Feb 22, 2008, Andrew MacIntyre wrote:
Vladimir Marangozov wrote:
And, of course, if the int/float freelist scheme was a real issue we would have probably heard of it by now in a very sound way.
Not much noise has been made here, but I've come across 2 separate complaints in different mailing lists in the last month (one of the complainants had some hope that gc.collect() might help...) Is it a deafening roar? No, but I doubt the volume will do anything but increase...
Actually, it's a regular issue that crops up on c.l.py, but almost always in the context of range(bignum). Because there's an easy workaround and Python 3.0 changes the semantics of range(), there hasn't been much clamor to fix it. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "All problems in computer science can be solved by another level of indirection." --Butler Lampson
participants (4)
-
"Martin v. Löwis"
-
Aahz
-
Andrew MacIntyre
-
Vladimir Marangozov