jcarlson at uci.edu
Mon Nov 1 22:22:27 CET 2004
bokr at oz.net (Bengt Richter) wrote:
> On Mon, 01 Nov 2004 00:13:18 -0700, Josiah Carlson <jcarlson at uci.edu> wrote:
> >I wasn't sure that I believed your statements in regards to Python's
> >cache performance, so I thought I'd give it a look...
> I'm not sure which statements you are referring to, but if it's about
> bad jobsharing between multiple processors, I don't think that's what's
> causing the result you get below. I get a similar slowdown after shuffling.
For some reason, when someone mentioned 'working set size', I got to
thinking of processor caches and not VM working sets.
Sorry about that, I had just gotten back from throwing a party, and my
brain was pretty fried.
Really the numbers I quoted don't show anything interesting in regards
to dual vs single processor performance, but it does show interaction
between processor caches and locality in Python programs.
When created, the original list has Python integers placed sequentially
in the integer free list, so when iterating over them, we get reasonable
After shuffled, we are basically doing random reads through the integer
free list; and since my processor's cache is so much smaller than the
integer free list, we observe a ~50% slowdown.
> So how big an L2 cache? 512k bytes? How big is an int object? Even if type info is
> shared for an arena, it must be a value,refcount pair each at least, or 8 bytes.
> I guess I'd bet on no tricks for type info, so that's a pointer for that -> 12 bytes.
Though I could have sworn int objects each took 16 bytes, testing it out
(because I can't find the proper C source file in Python CVS) puts it at
12 bytes/int. Curse my human brain.
[snip discussion of cache sizes, interleaving, etc.]
Anyways, it is interesting to note that Python does show the classic
memory locality and caching performance, even though it doesn't change
performance all that much.
More information about the Python-list