[Python-Dev] Python 3000: Special type for object attributes & map keys

Neal Norwitz nnorwitz at gmail.com
Wed Mar 19 00:54:47 CET 2008


On Wed, Mar 5, 2008 at 4:27 PM, Henrik Vendelbo <hvendelbo.dev at gmail.com> wrote:
> It appears to me that if you can make mapping mechanisms faster in
>  Python you can make significant
>  overall speed improvements. I also think the proposed concept could
>  add flexibility to persistence formats
>  and RMI interfaces.
>
>  My basic idea is to have a constant string type with an interpreter
>  globally unique hash. If the original constant
>  is created in a manner different from string constants, it can be
>  tracked and handled differently by the interpreter.

Part of this is done, but very differently in that all strings used in
code objects are interned (stored in a dictionary so we don't increase
memory by storing multiple string objects which contain the same
string) .  So there is partially a mechanism to do what you suggest.
But there would be many places that would need to be modified.

I think we briefly discussed this in the past.

>  P.S. I originally thought of this in a JavaScript context so forgive
>  me if this would make little difference in Python.

Your message was a little confusing at first because the terminology
is a little different, but the idea makes sense.  It's not clear how
much this would speed up the interpreter.  The best way to test your
theory would be to create a patch and measure the performance
difference.

First, you should measure the current speed difference.  Something like:

$ ./python.exe -m timeit -s 'd = {1: None}' 'd[1]'
1000000 loops, best of 3: 0.793 usec per loop
$ ./python.exe -m timeit -s 'd = {"1": None}' 'd["1"]'
1000000 loops, best of 3: 0.728 usec per loop

My python is a debug version, so a release version might be faster for
ints.  If not, the first task would be to speed up int lookups. :-)
(You should verify more with real world dict sizes.)  It is possible
to optimize dicts with int keys since string keys are specialized in
dicts, but ints are not.  You would need to look in
Objects/dictobject.c.  See http://python.org/dev/faq/ for general
hints on how to get started.

If ints were faster, some of the next steps would be:
 * keep the globally unique number (very easy)
 * update the source that generates byte codes to use the globally unique number
 * store ints in dicts and update all the places for how they use attributes
 * update byte code when a module is imported to use the globally unique number

Feel free to ask questions.

Cheers,
n


More information about the Python-Dev mailing list