[Python-Dev] insertdict slower?

Sat, 03 Feb 2001 13:23:54 +0100

Tim Peters wrote:
> 
> [MAL]
> > It doesn't have anything to do with icache conflicts or
> > other esoteric magic at dye-level. The reason for the slowdown
> > is that the benchmark uses integers as keys and these have to
> > go through the whole rich compare machinery to find their way into
> > the dictionary.
> 
> But insertdict doesn't do any compares at all (besides C pointer comparison
> to NULL).  There's more than one mystery here.  The one I was addressing is
> why the profiler said *insertdict* got slower.  Jeremy's profile did not
> give any reason to suspect that lookdict got slower (which is where the
> compares are); to the contrary, it said lookdict got 4.5% *faster* in 2.1.
> 
> > Please see my other post on the subject -- I think we need
> > an optimized API especially for checking for equality.
> 
> Quite possibly, but if Jeremy remains keen to help with investigating timing
> puzzles, he needs to figure out why his profiling approach is pointing him
> at the wrong functions.  That has long-term value far beyond patching the
> regression du jour.
> 
> it's-not-either/or-it's-both-ly y'rs  -tim

Ok, let's agree on "it's both" :)

I was referring to the slowdown which shows up in the DictCreation
benchmark. It uses bundles of these operations to check for
dictionary creation speed:

            d1 = {}
            d2 = {}
            d3 = {}
            d4 = {}
            d5 = {}

            d1 = {1:2,3:4,5:6}
            d2 = {2:3,4:5,6:7}
            d3 = {3:4,5:6,7:8}
            d4 = {4:5,6:7,8:9}
            d5 = {6:7,8:9,10:11}

Note that the number of inserted items is 3; the minimum size
of the allocated table is 4. Apart from the initial allocation
of the dictionary table, no further resizes are done.

One of the micro-optimizations which I used in my patched Python
version deals with these rather common situations: small dictionaries
are inlined (up to a certain size) in the object itself rather
than stored in a separatly malloced table. I found that a limit
of 8 slots gives you the best ratio between performance boost and
memory overhead.

This is another area where Valdimir's pymalloc could help, since it
favours small memory chunks.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/