a question on python dict
could.net at gmail.com
could.net at gmail.com
Thu Dec 21 02:37:28 EST 2006
Thank you very much for your explanation!
I made a mistake that I said the hash value should be recalculated each
time the dict resize, what I wanted to say was that the position of
each item should be recalculated.
Maybe I should take a look at the source code of python dict some day.
I'll some here for help then.
Thanks again!
Tim Peters wrote:
> [could.net at gmail.com]
> > Python dict is a hash table, isn't it?
>
> Yup.
>
> > I know that hashtable has the concept of "bucket size" and "min bucket
> > count" stuff,
>
> Some implementations of hash tables do. Python's does not. Python's
> uses what's called "open addressing" instead.
>
> > and they should be configurable so I can set them to the proper value
> > when I know how many items I'm going to handle.
> >
> > If these two values can't be set, the hashtable will give them default
> > values. When there are more and more items being added to the
> > hashtable, it increase its buckets and copy the old items into the new
> > one and re-calculate the hash value of each item.
>
> That's several distinct claims, each of which is true of some hash
> table implementations but not of others. Python's implementation has
> no "buckets", so all that's simply irrelevant. Python's
> implementation (not all do) saves the hash value of each item, so when
> it does need to grow (or shrink) the space allocated to the table, it
> does not need to recalculate the hash value of any item.
>
> You should also note that copying a dict key or value (no matter of
> what type) consists in its entirety of copying one machine address (a
> 4- or 8-byte pointer, depending on platform).
>
> > I think this will waste some time doing the increasing-copying thing.
>
> It does take time to resize. "Waste" is a value judgment ;-)
>
> > If I know I'm going to handle about 20000 items, I can set the size of
> > hashtable to 30000.
> >
> > So, can I do this in python?
>
> You can't. Unless you build up to 20000 items over and over (many
> thousands of times), it's unlikely you'd be able to measure the time
> difference even if you could.
More information about the Python-list
mailing list