
What is the effect on peak memory usage over "average" programs? This might be a worthwhile speedup on small dicts (up to a TBD number of entries) but not worthwhile for large dicts. However, to add this capability in would of course add more code to a very common code path (additional test for current size to decide what factor to increase by).
Nice (in relative, not absolute terms). Do we have any numbers on memory usage during and after that period? Tim Delaney

What is the effect on peak memory usage over "average" programs?
Since the amortized growth is the same, the effect is Nil on average. Special cases can be contrived with a specific number of records where the memory use is doubled, but in general it is nearly unchanged for average programs.
This might be a worthwhile speedup on small dicts (up to a TBD number of entries) but not worthwhile for large dicts.
Actually, it helps large dictionaries even more that small dictionaries. Collisions in large dicts are resolved through other memory probes which are almost certain not to be in the current cache line.
Your intuition is exactly correct. All experiments to special case various sizes results in decreased performance because it added a tiny amount to some of the most heavily exercised code in python. Further, it results in an unpredicable branch which is also not a good thing. [GvR]
[Tim]
Nice (in relative, not absolute terms). Do we have any numbers on memory usage during and after that period?
I found out that timing dict performance was hard. Capturing memory usage was harder. Checking entry space,space plus unused space, calls to PyMalloc, and calls to the os malloc, only the last is important, but it depends on all kinds of things that are not easily controlled. Tim Delaney _______________________________________________

What is the effect on peak memory usage over "average" programs?
Since the amortized growth is the same, the effect is Nil on average. Special cases can be contrived with a specific number of records where the memory use is doubled, but in general it is nearly unchanged for average programs.
This might be a worthwhile speedup on small dicts (up to a TBD number of entries) but not worthwhile for large dicts.
Actually, it helps large dictionaries even more that small dictionaries. Collisions in large dicts are resolved through other memory probes which are almost certain not to be in the current cache line.
Your intuition is exactly correct. All experiments to special case various sizes results in decreased performance because it added a tiny amount to some of the most heavily exercised code in python. Further, it results in an unpredicable branch which is also not a good thing. [GvR]
[Tim]
Nice (in relative, not absolute terms). Do we have any numbers on memory usage during and after that period?
I found out that timing dict performance was hard. Capturing memory usage was harder. Checking entry space,space plus unused space, calls to PyMalloc, and calls to the os malloc, only the last is important, but it depends on all kinds of things that are not easily controlled. Tim Delaney _______________________________________________

[Tim Delaney]
What is the effect on peak memory usage over "average" programs?
[Raymond Hettinger]
That doesn't make sense. Dicts can be larger after the patch, but never smaller, so there's nothing opposing the "can be larger" part: on average, allocated address space must be strictly larger than before. Whether that *matters* on average to the average user is something we can answer rigorously just as soon as we find an average user with an average program <wink>. I'm not inclined to worry much about it.
This might be a worthwhile speedup on small dicts (up to a TBD number of entries) but not worthwhile for large dicts.
That part makes sense. Resizing a large dict is an expensive operation too.
This part isn't clear: the changed code is in the body of an if() block that normally *isn't* entered (over an ever-growing dict's life, it's entered O(log(len(dict))) times, and independent of the number of dict lookups). The change cuts the number of times it's entered by approximately a factor of 2, but it isn't entered often even now.
Further, it results in an unpredicable branch which is also not a good thing.
Since the body of the loop isn't entered often, unpredictable one-shot branches within the body shouldn't have a measurable effect. The unpredictable branches when physically resizing the dict will swamp them regardless. The surrounding if-test continues to be predictable in the "branch taken" direction. What could be much worse is that stuffing code into the if-block bloats the code so much as to frustrate lookahead I-stream caching of the normal "branch taken and return 0" path: if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) { if (dictresize(mp, mp->ma_used*2) != 0) return -1; } return 0; Rewriting as if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2) return 0; return dictresize(mp, mp->ma_used*2) ? -1 : 0; would help some compilers generate better code for the expected path, and especially if the blob after "return 0;" got hairier. IOW, if fiddling with different growth factors at different sizes slowed things down, we have to look for something that affected the *normal* paths; it's hard to imagine that the the guts of the if-block execute often enough to matter (discounting its call to dictresize(), which is an expensive routine).
In my early Cray days, the Cray boxes were batch one-job-at-a-time, and all memory was real. If you had a CPU-bound program, it took the same number of nanoseconds each time you ran it. Benchmarking was hard then too <0.5 wink>.

That last line might as well be return dictresize(mp, mp->ma_used*2); /* Or *4, per Raymond */ Which reminds me, there are two other places where dictresize() is called; shouldn't those be changed to the new fill factor? All in ll I think I'm mildly in favor of this change. --Guido van Rossum (home page: http://www.python.org/~guido/)

What is the effect on peak memory usage over "average" programs?
Since the amortized growth is the same, the effect is Nil on average. Special cases can be contrived with a specific number of records where the memory use is doubled, but in general it is nearly unchanged for average programs.
This might be a worthwhile speedup on small dicts (up to a TBD number of entries) but not worthwhile for large dicts.
Actually, it helps large dictionaries even more that small dictionaries. Collisions in large dicts are resolved through other memory probes which are almost certain not to be in the current cache line.
Your intuition is exactly correct. All experiments to special case various sizes results in decreased performance because it added a tiny amount to some of the most heavily exercised code in python. Further, it results in an unpredicable branch which is also not a good thing. [GvR]
[Tim]
Nice (in relative, not absolute terms). Do we have any numbers on memory usage during and after that period?
I found out that timing dict performance was hard. Capturing memory usage was harder. Checking entry space,space plus unused space, calls to PyMalloc, and calls to the os malloc, only the last is important, but it depends on all kinds of things that are not easily controlled. Tim Delaney _______________________________________________

What is the effect on peak memory usage over "average" programs?
Since the amortized growth is the same, the effect is Nil on average. Special cases can be contrived with a specific number of records where the memory use is doubled, but in general it is nearly unchanged for average programs.
This might be a worthwhile speedup on small dicts (up to a TBD number of entries) but not worthwhile for large dicts.
Actually, it helps large dictionaries even more that small dictionaries. Collisions in large dicts are resolved through other memory probes which are almost certain not to be in the current cache line.
Your intuition is exactly correct. All experiments to special case various sizes results in decreased performance because it added a tiny amount to some of the most heavily exercised code in python. Further, it results in an unpredicable branch which is also not a good thing. [GvR]
[Tim]
Nice (in relative, not absolute terms). Do we have any numbers on memory usage during and after that period?
I found out that timing dict performance was hard. Capturing memory usage was harder. Checking entry space,space plus unused space, calls to PyMalloc, and calls to the os malloc, only the last is important, but it depends on all kinds of things that are not easily controlled. Tim Delaney _______________________________________________

[Tim Delaney]
What is the effect on peak memory usage over "average" programs?
[Raymond Hettinger]
That doesn't make sense. Dicts can be larger after the patch, but never smaller, so there's nothing opposing the "can be larger" part: on average, allocated address space must be strictly larger than before. Whether that *matters* on average to the average user is something we can answer rigorously just as soon as we find an average user with an average program <wink>. I'm not inclined to worry much about it.
This might be a worthwhile speedup on small dicts (up to a TBD number of entries) but not worthwhile for large dicts.
That part makes sense. Resizing a large dict is an expensive operation too.
This part isn't clear: the changed code is in the body of an if() block that normally *isn't* entered (over an ever-growing dict's life, it's entered O(log(len(dict))) times, and independent of the number of dict lookups). The change cuts the number of times it's entered by approximately a factor of 2, but it isn't entered often even now.
Further, it results in an unpredicable branch which is also not a good thing.
Since the body of the loop isn't entered often, unpredictable one-shot branches within the body shouldn't have a measurable effect. The unpredictable branches when physically resizing the dict will swamp them regardless. The surrounding if-test continues to be predictable in the "branch taken" direction. What could be much worse is that stuffing code into the if-block bloats the code so much as to frustrate lookahead I-stream caching of the normal "branch taken and return 0" path: if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) { if (dictresize(mp, mp->ma_used*2) != 0) return -1; } return 0; Rewriting as if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2) return 0; return dictresize(mp, mp->ma_used*2) ? -1 : 0; would help some compilers generate better code for the expected path, and especially if the blob after "return 0;" got hairier. IOW, if fiddling with different growth factors at different sizes slowed things down, we have to look for something that affected the *normal* paths; it's hard to imagine that the the guts of the if-block execute often enough to matter (discounting its call to dictresize(), which is an expensive routine).
In my early Cray days, the Cray boxes were batch one-job-at-a-time, and all memory was real. If you had a CPU-bound program, it took the same number of nanoseconds each time you ran it. Benchmarking was hard then too <0.5 wink>.

That last line might as well be return dictresize(mp, mp->ma_used*2); /* Or *4, per Raymond */ Which reminds me, there are two other places where dictresize() is called; shouldn't those be changed to the new fill factor? All in ll I think I'm mildly in favor of this change. --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (4)
-
Delaney, Timothy C (Timothy)
-
Guido van Rossum
-
Raymond Hettinger
-
Tim Peters