dict would be very slow for big data

Tim Chase python.list at tim.thechases.com
Tue May 12 06:21:15 EDT 2009


> i am trying to insert a lot of data into a dict, which may be
> 10,000,000 level.
> after inserting 100000 unit, the insert rate become very slow, 50,000/
> s, and the entire time used for this task would be very long,also.
> would anyone know some solution for this case?

As others have mentioned, you've likely run out of RAM and the 
slowness you feel is your OS swapping your process to disk.

If you need fast dict-like access to your data, I'd recommend 
shifting to a database -- perhaps the stock "anydbm" module[1]. 
The only catch is that it only supports strings as keys/values. 
But Python makes it fairly easy to marshal objects in/out of 
strings.  Alternatively, you could use the built-in (as of 
Python2.5) sqlite3 module to preserve your datatypes and query 
your dataset with the power of SQL.

-tkc


[1]
http://docs.python.org/library/anydbm.html








More information about the Python-list mailing list