Populating a dictionary, fast [SOLVED SOLVED]
steve at REMOVE-THIS-cybersource.com.au
Thu Nov 15 23:10:02 CET 2007
On Thu, 15 Nov 2007 21:51:21 +0100, Hrvoje Niksic wrote:
> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
>>>> Someone please summarize.
>>> Yes, that would be good.
>> On systems with multiple CPUs or 64-bit systems, or both, creating
>> and/or deleting a multi-megabyte dictionary in recent versions of
>> Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+
>> minutes, compared to seconds if the system only has a single CPU.
> Can you post minimal code that exhibits this behavior on Python 2.5.1?
> The OP posted a lot of different versions, most of which worked just
> fine for most people.
Who were testing it on single-CPU, 32 bit systems.
The plot thickens... I wrote another version of my test code, reading the
data into a list of tuples rather than a dict:
$ python slurp_dict4.py # actually slurp a list, despite the name
Starting at Fri Nov 16 08:55:26 2007
Items in list: 8191180
Completed import at Fri Nov 16 08:56:26 2007
Starting to delete list...
Completed deletion at Fri Nov 16 08:57:04 2007
Finishing at Fri Nov 16 08:57:04 2007
Quite a reasonable speed, considering my limited memory.
What do we know so far?
(1) The problem occurs whether or not gc is enabled.
(2) It only occurs on some architectures. 64 bit CPU seems to be common
(3) I don't think we've seen it demonstrated under Windows, but we've
seen it under at least two different Linux distros.
(4) It affects very large dicts, but not very large lists.
(5) I've done tests where instead of one really big dict, the data is put
into lots of smaller dicts. The problem still occurs.
(6) It was suggested the problem is related to long/int unification, but
I've done tests that kept the dict keys as strings, and the problem still
(7) It occurs in Python 2.3, 2.4 and 2.5, but not 2.5.1.
Do we treat this as a solved problem and move on?
More information about the Python-list