dict is really slow for big truck
Bruno Desthuilliers
bruno.42.desthuilliers at websiteburo.invalid
Thu Apr 30 03:43:18 EDT 2009
forrest yang a écrit :
> i try to load a big file into a dict, which is about 9,000,000 lines,
> something like
> 1 2 3 4
> 2 2 3 4
> 3 4 5 6
How "like" is it ?-)
> code
> for line in open(file)
> arr=line.strip().split('\t')
> dict[arr[0]]=arr
>
> but, the dict is really slow as i load more data into the memory,
Looks like your system is starting to swap. Use 'top' or any other
system monitor to check it out.
> by
> the way the mac i use have 16G memory.
> is this cased by the low performace for dict to extend memory
dicts are Python's central data type (objects are based on dicts, all
non-local namespaces are based on dicts, etc), so you can safely assume
they are highly optimized.
> or
> something other reason.
FWIW, a very loose (and partially wrong, cf below) estimation based on
wild guesses: assuming an average size of 512 bytes per object (remember
that Python doesn't have 'primitive' types), the above would use =~ 22G.
Hopefully, CPython does some caching for some values of some immutable
types (specifically, small ints and strings that respect the grammar for
Python identifiers), so depending on your real data, you might need a
bit less RAM. Also, the 512 bytes per object is really more of a wild
guess than anything else (but given the internal structure of a CPython
object, I think it's about that order - please someone correct me if I'm
plain wrong).
Anyway: I'm afraid the problem has more to do with your design than with
your code or Python's dict implementation itself.
> is there any one can provide a better solution
Use a DBMS. They are designed - and highly optimised - for fast lookup
over huge data sets.
My 2 cents.
More information about the Python-list
mailing list